An overview of GAN basic theories (2014 Goodfellow)
An overview of GAN basic theories
Generative Adversarial Network [2014 Goodfellow]
Components
Discriminator: Decide whether the input is a real image or fake
Generator: Generate an image to try to fool the discriminator
Formula
min-max game
$$
\min_G\max_DV(D, G)= \mathbb{E}{x\sim{p}{data}(x)}[\log{D}(x)] + \mathbb{E}{z\sim{p}{z}(z)}[\log{(1-D(G(z)))}]
$$
Explanation
Value Function
Different from Cost function. it is a concept from Reinforcement Learning
Under policy D(discriminator)
If it performs well, We hope to get max value
$D(x)\rightarrow1$ $D(G(z))\rightarrow0$
$V\rightarrow0$
Under policy G(generator)
If it performs well, We hope to get min value
$D(G(z))\rightarrow1$
$V \rightarrow -\infin$
x is the real image(sample) and follows the distribution of data
z is the fake data(noise) follows the distribution of fake data (noise) (since noise is used to generate image here, the fake data here is noise)
Expectation $\mathbb{E}$
$\mathbb{E}(x) = \int_{-\infin}^{\infin}xp(x)dx$
$\mathbb{E}{x\sim{p}{data}(x)}[\log{D}(x)] = \int_{-\infin}^{\infin}{p}{data}(x)\log{D}(x) = \int{x}{p}_{data}(x)\log({D}(x))dx$
$\mathbb{E}{z\sim{p}{z}(z)}[\log{(1-D(G(z)))}] = \int_{-\infin}^{\infin}\mathbb{E}{z\sim{p}{z}(z)}[\log{(1-D(G(z)))}] = \int_{x}{p}_{g}(x)\log(1-{D}(x))dx$
Hint: replace the output of G(z) to x and change the subscript
$V(D,G) = \int_{x}{p}{data}(x)\log({D}(x))dx + \int{x}{p}{g}(x)\log(1-{D}(x))dx = \int{x}{p}{data}(x)\log({D}(x))dx + {p}{g}(x)\log(1-{D}(x))dx$
Objective
we want to make the distribution of fake data become real data distribution by training.
For the data set, after converting the data into vectors, the data of people has its own distribution, and the data of cars also has its own distribution. The data distribution of cars is definitely not the same as the data distribution of people. For two-dimensional data with two Gaussian distributions, the difference between the two data distributions is equivalent to the difference in expectation and variance. Of course, the actual data is much more complex than the above example, we can only understand their distribution abstractly
Why distribution?
For traditional networks, we need paired data to train the model (sample to sample). For example, if we want the model to do the colorization. we need a gray picture of chicken and corresponding color picture.
But if we let the model learn the distribution of color pictures, we can use other color picture like car, horse and human to let the model do the colorize job for chicken picture.
But how can we prevent the model turning the chicken picture into horse? Remember, we can stack another loss function to discriminate the picture. In this way, we can colorize the picture and leave the object's appearance unchanged in space
Solve in code
Gradient
Discriminator
$\nabla{_\theta{_d}}\frac{1}{m}\sum\limitsm_{i=1}[\log{D}(x{(i)}) + \log(1-D(G(z^{(i)})))]$
Generator
$\nabla{_\theta{_d}}\frac{1}{m}\sum\limitsm_{i=1}\log(1-D(G(z{(i)})))$
Problem
At the start of training, the gradient of generator is very small
solutions
Change the function $\log(1-D(G(z))\Leftrightarrow-\log(D(G(z))$
Use negative label to train(show in sample code)
Sample Training Code
criterion = nn.BCELoss()
valid = Tensor(imgs.size(0), 1).fill_(1.0).detach()
fake = Tensor(imgs.size(0), 1).fill_(0.0).detach()
# detach means "requires_grad = False"
# Train Generator
optimizer_G.zero_grad()
z = Tensor(np.random.normal(0, 1, (imgs.shape[0], latent_dim)))
gen_imgs = generator(z)
g_loss = criterion(discriminator(gen_imgs), valid)
g_loss.backward()
optimizer_G.step()
# Train Discriminator
optimizer_D.zero_grad()
real_loss = criterion(discriminator(real_imgs), valid)
fake_loss = criterion(discriminator(gen_imgs.detach()), fake)
d_loss = (real_loss + fake_loss) / 2
d_loss.backward()
optimizer_D.step()
Problems
- Hard to train
- prone to generate similar images
- Generated images are blurred and not real
- Training procedure is unstable