WGAN Part.2
Formula
In the previous article, we said that the use of KL scatter in GAN is problematic, so the authors of WGAN immediately proposed the Wasserstein Distance to replace the previous KL and JS scatter
\[W(\mathbb{P}_r,\mathbb{P}_g)=\color{green}\inf_{\color{blue}\gamma\in\Pi(\mathbb{P}_r,\mathbb{P}_g)}\color{red}{\mathbb{E}_{(x, y)\sim\gamma}}||x-y||\]
Formula explanation
The inf in the above equation refers to the lower bound
x, y can be considered as the parts of two distributions
The latter part can be rewritten in the following form
\[\mathbb{E}_{(x, y)\sim\gamma}||x-y||=\int_y\int_x\gamma(x, y)||x-y||dxdy=\sum{x, y}||x-y||\gamma (x, y)\]
This part is actually calculating a weighted sum (the x ā y part multiplied by a distance matrix)
To make it easier to understand, I have drawn a diagram.
![Matrix](03 WGAN part2.assets/Matrix.png)
Suppose we have the distribution of the noise and the real image as above, our generator wants to transform the distribution of the noise to the distribution of the real image, there can be many kinds of solutions, we label each small interval
We want to change the number of interval 3 to 8, so we can move the data from the original interval 3 to another interval, divide 5 from interval 4, divide 1 from interval 1 plus use interval 5 to make up 8 from interval 3.
Of course, there are many kinds of scraping methods, both good and bad, and the above is one of the more stupid ones, so we need an indicator to judge whether our method is good or not. So here we introduce a distance matrix that represents the distance that the original interval is partially shifted.
![Matrix](03 WGAN part2.assets/Matrix.png)
p.s. The weighted sum of x and y distances can actually be considered as expectation
After reading the above, you will basically understand what the WGAN formula is doing
Incidentally, Wasser and stein come from the German words "water" and "stone" respectively, meaning that the change in distribution is like a river scouring the sediment at the bottom of the river (solving the optimal solution to the above problem can also be seen as the river taking the muddy (The optimal solution to the above problem can also be thought of as the river taking the mud-rich area to the back of the river and filling in the holes as it encounters them.)
Let the formula be calculated
As mentioned just now, our formula is to calculate the lower exact bound, but if we follow the above statement alone, the problem is not fixed due to the possible scaling method and there are very many cases. So this formula is not directly used to calculate, need to do some processing of this formula, so that it can be directly calculated
After all, I'm not a math major, so I don't understand a lot of it, but it's necessary to understand the idea.
\[\begin{align}W(\mathbb{P}r,\mathbb{P}g)&=\inf{\gamma\in\Pi(\mathbb{P}r, \mathbb{P}g)} \mathbb{E}{(x,y)\sim \gamma}[||x-y||]\\&= \sup{||f||L\le1}\mathbb{E}{x\sim\mathbb{P}r}[f(x)] - \mathbb{E}{x \sim \mathbb{P}g} [f(x)] \dots ā \\&= \color{blue}\max{w \in W}\mathbb{E}_{x\sim\mathbb{P}r}[f_w(x)] - \mathbb{E}{z \sim \mathbb{P}z} [f_w(g{\theta}(z))] \end{align}\]
in Formula (2), we convert the lower positive boundary to an upper positive boundary. and eliminate Ī³.
From formula (1) to formula(2), we use Kantorovich-Rubinstein Duality
From formula (2) to formula(3), we use Lipschitz Constraint
Formula (1) to Formula(2)
\[\begin{align}W(\mathbb{P}_r, \mathbb{P}g)&=\inf{\gamma\in\Pi(\mathbb{P}r, \mathbb{P}g)} \mathbb{E}{(x,y)\sim \gamma}[||x-y||]\\&=\inf{\gamma\in\Pi(\mathbb{P}_r, \mathbb{P}g)} \int_y \int_x \gamma(x,y) ||x-y|| dx dy\\&=\inf{\gamma\in\Pi(\mathbb{P}_r, \mathbb{P}_g)} {<\Pi,D>|A\Pi = b,\Pi\ge0}\end{align}\]
<Ī ļ¼D> is equivalent to the previously mentioned weighted sum, the transpose of one-dimensional matrix Ć one-dimensional matrix
\[\Pi = \left(\begin{matrix}\gamma(x_1, y_1) \\ \gamma(x_1, y_2)\\ \vdots\\ \gamma(x_2, y_1)\\ \gamma(x_2, y_2)\\ \vdots\\ \gamma(x_n, y_1)\\ \gamma(x_n, y_2)\end{matrix}\right)D = \left(\begin{matrix}d(x_1, y_1)\\ d(x_1, y_2)\\\vdots\\d(x_2, y_1)\\ d(x_2, y_2)\\\vdots\\d(x_n, y_1)\\d(x_n, y_2)\end{matrix}\right)\]
Why we say: \(A\Pi = b\)ļ¼
\[A=\left(\begin{array}{ccc|ccc|c|ccc|c} 1 & 1 & \dots & 0 & 0 & \dots & \dots & 0 & 0 & \dots & \dots \\0 & 0 & \dots & 1 & 1 & \dots & \dots & 0 & 0 & \dots & \dots\\\vdots & \vdots & \ddots & \vdots & \vdots & \ddots & \ddots & \vdots & \vdots & \ddots & \ddots \\0 & 0 & \dots & 0 & 0 & \dots & \dots & 1 & 1 & \dots & \dots\\\vdots & \vdots & \ddots & \vdots & \vdots & \ddots & \ddots & \vdots & \vdots & \ddots & \ddots \\\hline 1 & 0 & \dots & 1 & 0 & \dots & \dots & 1 & 0 & \dots & \dots\\0 & 1 & \dots & 0 & 1 & \dots & \dots & 0 & 1 & \dots & \dots\\\vdots & \vdots & \ddots & \vdots & \vdots & \ddots & \ddots & \vdots & \vdots & \ddots & \ddots \\0 & 0 & \dots & 0 & 0 & \dots & \dots & 1 & 1 & \dots & \dots\\\vdots & \vdots & \ddots & \vdots & \vdots & \ddots & \ddots & \vdots & \vdots & \ddots & \ddots \end{array}\right) \\\Pi =\left(\begin{matrix}\gamma(x_1, y_1)\\ \gamma(x_1, y_2)\\ \vdots\\\hline\gamma(x_2, y_1)\\ \gamma(x_2, y_2)\\\vdots\\\hline\vdots\\\hline\gamma(x_n, y_1)\\ \gamma(x_n, y_2)\\ \vdots\\\hline\vdots \end{matrix}\right) B =\left(\begin{matrix}p_r(x_1)\\ p_r(x_2)\\ \vdots \\ p_r(x_n) \\ \vdots \\\hline p_g(x_1)\\ p_g(x_2)\\ \vdots \\ p_g(x_n) \\ \vdots \\\end{matrix}\right)\]