An overview of GAN basic theories (2014 Goodfellow)

An overview of GAN basic theories

An overview of GAN basic theories (2014 Goodfellow)
Photo by DeepMind / Unsplash

Generative Adversarial Network [2014 Goodfellow]

Components

Discriminator: Decide whether the input is a real image or fake

Generator: Generate an image to try to fool the discriminator

Formula

min-max game


$$
\min_G\max_DV(D, G)= \mathbb{E}{x\sim{p}{data}(x)}[\log{D}(x)] + \mathbb{E}{z\sim{p}{z}(z)}[\log{(1-D(G(z)))}]
$$

Explanation

Value Function

Different from Cost function. it is a concept from Reinforcement Learning

Under policy D(discriminator)

If it performs well, We hope to get max value

$D(x)\rightarrow1$ $D(G(z))\rightarrow0$

$V\rightarrow0$

Under policy G(generator)

If it performs well, We hope to get min value

$D(G(z))\rightarrow1$

$V \rightarrow -\infin$

x is the real image(sample) and follows the distribution of data

z is the fake data(noise) follows the distribution of fake data (noise) (since noise is used to generate image here, the fake data here is noise)

Expectation $\mathbb{E}$

$\mathbb{E}(x) = \int_{-\infin}^{\infin}xp(x)dx$

$\mathbb{E}{x\sim{p}{data}(x)}[\log{D}(x)] = \int_{-\infin}^{\infin}{p}{data}(x)\log{D}(x) = \int{x}{p}_{data}(x)\log({D}(x))dx$

$\mathbb{E}{z\sim{p}{z}(z)}[\log{(1-D(G(z)))}] = \int_{-\infin}^{\infin}\mathbb{E}{z\sim{p}{z}(z)}[\log{(1-D(G(z)))}] = \int_{x}{p}_{g}(x)\log(1-{D}(x))dx$

Hint: replace the output of G(z) to x and change the subscript

$V(D,G) = \int_{x}{p}{data}(x)\log({D}(x))dx + \int{x}{p}{g}(x)\log(1-{D}(x))dx = \int{x}{p}{data}(x)\log({D}(x))dx + {p}{g}(x)\log(1-{D}(x))dx$

Objective

we want to make the distribution of fake data become real data distribution by training.

For the data set, after converting the data into vectors, the data of people has its own distribution, and the data of cars also has its own distribution. The data distribution of cars is definitely not the same as the data distribution of people. For two-dimensional data with two Gaussian distributions, the difference between the two data distributions is equivalent to the difference in expectation and variance. Of course, the actual data is much more complex than the above example, we can only understand their distribution abstractly

Why distribution?

For traditional networks, we need paired data to train the model (sample to sample). For example, if we want the model to do the colorization. we need a gray picture of chicken and corresponding color picture.

But if we let the model learn the distribution of color pictures, we can use other color picture like car, horse and human to let the model do the colorize job for chicken picture.

But how can we prevent the model turning the chicken picture into horse? Remember, we can stack another loss function to discriminate the picture. In this way, we can colorize the picture and leave the object's appearance unchanged in space

Solve in code

Gradient

Discriminator

$\nabla{_\theta{_d}}\frac{1}{m}\sum\limitsm_{i=1}[\log{D}(x{(i)}) + \log(1-D(G(z^{(i)})))]$

Generator

$\nabla{_\theta{_d}}\frac{1}{m}\sum\limitsm_{i=1}\log(1-D(G(z{(i)})))$

Problem

At the start of training, the gradient of generator is very small

solutions

Change the function $\log(1-D(G(z))\Leftrightarrow-\log(D(G(z))$

Use negative label to train(show in sample code)

Sample Training Code

criterion = nn.BCELoss()

valid = Tensor(imgs.size(0), 1).fill_(1.0).detach()  
fake = Tensor(imgs.size(0), 1).fill_(0.0).detach()
# detach means "requires_grad = False"

# Train Generator
optimizer_G.zero_grad()

z = Tensor(np.random.normal(0, 1, (imgs.shape[0], latent_dim)))
gen_imgs = generator(z)

g_loss = criterion(discriminator(gen_imgs), valid)
g_loss.backward()
optimizer_G.step()

# Train Discriminator
optimizer_D.zero_grad()

real_loss = criterion(discriminator(real_imgs), valid)
fake_loss = criterion(discriminator(gen_imgs.detach()), fake)
d_loss = (real_loss + fake_loss) / 2

d_loss.backward()
optimizer_D.step()

Problems

  • Hard to train
  • prone to generate similar images
  • Generated images are blurred and not real
  • Training procedure is unstable