There seems to be a new GAN called RealnessGAN, but there seems to be little information in Japanese. I implemented it and learned about CIFAR-10.

The paper is Real or Not Real, that is the Question. I also referred to Implementation by the author of the paper. Implementation here is also helpful.

It seems that if you use RealnessGAN, you can study beautifully even with DCGAN.

https://github.com/kam1107/RealnessGAN/blob/master/images/CelebA_snapshot.png

Overview

In a normal GAN, the output of the Discriminator is a scalar value that represents "Realness". In this paper, it is proposed to use Discriminator, which outputs the probability distribution of Realness.

By increasing the information output by Discriminator, it seems that learning of Generator will be better. According to the paper, even with a normal DCGAN structure, it succeeded in learning a 1024 x 1024 face image (FFHQ data set). FFHQ dataset training results

Meaning of symbols

-$ D $ Discriminator -$ G $ Generator -$ \ boldsymbol {z} $ Noise to put in Generator (latent expression) -$ \ mathcal {A} _0 $ Anchor for fake images (Distribution of Realness given as the correct answer for $ D ?) - \ mathcal {A} 1 $ Anchor for real images -$ p {\ mathrm {data}} (\ boldsymbol {x}) $ Probability of randomly selecting an image $ \ boldsymbol {x} $ from the dataset? -$ p_g (\ boldsymbol {x}) $ Probability that $ G (\ boldsymbol {z}) $ from randomly selected $ \ boldsymbol {z} $ becomes image $ \ boldsymbol {x} $?

Method

A normal GAN Discriminator outputs a continuous scalar value "Realness". On the other hand, Realness GAN Discriminator seems to output Realness discrete probability distribution. For example

D(\mbox{image}) = 
\begin{bmatrix}
\mbox{The Realness of the image}1.0\mbox{Probability of} \\
\mbox{The Realness of the image}0.9\mbox{Probability of} \\
\vdots \\
\mbox{The Realness of the image}-0.9\mbox{Probability of} \\
\mbox{The Realness of the image}-1.0\mbox{Probability of} \\
\end{bmatrix}

It seems to be like. It seems that this discretized Realness value is called Outcome in the paper. The probability distribution seems to be obtained by taking a softmax in the channel direction for the raw output of the Discriminator.

Also, it seems that the correct answer data about the probability distribution of Realness is called Anchor in the paper. For example

\mathcal{A}_0 = 
\begin{bmatrix}
\mbox{The Realness of the fake image}1.0\mbox{Probability of} \\
\mbox{The Realness of the fake image}0.9\mbox{Probability of} \\
\vdots \\
\mbox{The Realness of the fake image}-0.9\mbox{Probability of} \\
\mbox{The Realness of the fake image}-1.0\mbox{Probability of} \\
\end{bmatrix}

\mathcal{A}_1 = 
\begin{bmatrix}
\mbox{Realness of real images}1.0\mbox{Probability of} \\
\mbox{Realness of real images}0.9\mbox{Probability of} \\
\vdots \\
\mbox{Realness of real images}-0.9\mbox{Probability of} \\
\mbox{Realness of real images}-1.0\mbox{Probability of} \\
\end{bmatrix}

It seems that the range of Realness, the distribution of Anchor, etc. can be freely customized.

Objective function

According to the paper, the objective function is

\max_{G} \min_{D} V(G, D) = \mathbb{E}{\boldsymbol{x} \sim p{\mathrm{data}}}[\mathcal{D}{\mathrm{KL}}( \mathcal{A}{1} || D(\boldsymbol{x}) )] + \mathbb{E}{\boldsymbol{x} \sim p{g}}[\mathcal{D}{\mathrm{KL}}( \mathcal{A}{0} || D(\boldsymbol{x}) )]. \tag{3}


 It seems.
 If you extract the objective function of Generator $ G $ from it,

> ```math
(G_{\mathrm{objective1}}) \quad
\min_{G} 
- \mathbb{E}_{\boldsymbol{z} \sim p_{\boldsymbol{z}}}[\mathcal{D}_{\mathrm{KL}}( \mathcal{A}_{0} || D(G(\boldsymbol{z}))].
\tag{18}

It seems that learning will not go well with this. Therefore, the paper proposes two objective functions for $ G $.

(G_{\mathrm{objective2}}) \quad \min_{G} \quad \mathbb{E}{\boldsymbol{x} \sim p{\mathrm{data}}, \boldsymbol{z} \sim p_{\boldsymbol{z}}}[\mathcal{D}_{\mathrm{KL}}( D(\boldsymbol{x}) || D(G(\boldsymbol{z}))]

\mathbb{E}{\boldsymbol{z} \sim p{\boldsymbol{z}}}[\mathcal{D}{\mathrm{KL}}( \mathcal{A}{0} || D(G(\boldsymbol{z}))], \tag{19}


> ```math
(G_{\mathrm{objective3}}) \quad
\min_{G} \quad
\mathbb{E}_{\boldsymbol{z} \sim p_{\boldsymbol{z}}}[\mathcal{D}_{\mathrm{KL}}( \mathcal{A}_{1} || D(G(\boldsymbol{z}))]
- \mathbb{E}_{\boldsymbol{z} \sim p_{\boldsymbol{z}}}[\mathcal{D}_{\mathrm{KL}}( \mathcal{A}_{0} || D(G(\boldsymbol{z}))].
\tag{20}

As a result of experimentation, it seems that $ G_ {\ mathrm {objective2}} $ in equation (19) was the best of the three objective functions for $ G $.

Summary,

\begin{align}
\min_{D} & \quad
\mathbb{E}_{\boldsymbol{x} \sim p_{\mathrm{data}}}[\mathcal{D}_{\mathrm{KL}}( \mathcal{A}_{1} || D(\boldsymbol{x}))] +  
\mathbb{E}_{\boldsymbol{z} \sim p_{\boldsymbol{z}}}[\mathcal{D}_{\mathrm{KL}}( \mathcal{A}_{0} || D(G(\boldsymbol{z}) ))] \\
\min_{G} & \quad
\mathbb{E}_{\boldsymbol{x} \sim p_{\mathrm{data}}, \boldsymbol{z} \sim p_{\boldsymbol{z}}}[\mathcal{D}_{\mathrm{KL}}( D(\boldsymbol{x}) || D(G(\boldsymbol{z})))] -
\mathbb{E}_{\boldsymbol{z} \sim p_{\boldsymbol{z}}}[\mathcal{D}_{\mathrm{KL}}( \mathcal{A}_{0} || D(G(\boldsymbol{z})))]
\end{align}

It will be. $ \ mathbb {E} _ {\ boldsymbol {x} \ sim p_ {\ mathrm {data}}} [\ cdots] $, $ \ mathbb {E} _ {\ boldsymbol {z} \ sim p_ {\ boldsymbol { z}}} [\ cdots] $, $ \ mathbb {E} _ {\ boldsymbol {x} \ sim p_ {\ mathrm {data}}, \ boldsymbol {z} \ sim p_ {\ boldsymbol {z}}} Should the [\ cdots] $ part be the average of the mini-batch?

According to the paper, if Anchor is $ \ mathcal {A} _0 = [1, 0] $, $ \ mathcal {A} _1 = [0, 1] $, the objective function will have the same shape as a normal GAN, so RealnessGAN Seems to be considered a generalization of ordinary GAN.

Miscellaneous information

The dissertation contains some discussions and learning ideas, so I have summarized them.

Number of Outcomes

It seems that the more Outcome (the dimension of the output of Discriminator), the better. If you increase Outcome, should you increase the number of times you update Generator $ G $?

Anchor selection

The larger the KL divergence between the fake image Anchor $ \ mathcal {A} _0 $ and the real image Anchor $ \ mathcal {A} _1 $, the better.

Features resampling

It seems that performance will improve if the output dimension of the Discriminator is doubled and sampled from the normal distribution as the mean and standard deviation. In Github source, it seems that the standard deviation is not used as it is, but the index is taken after dividing by $ 2 $. (That is, the original output is the logarithm of the variance). Learning seems to be stable, especially in the second half of learning. I didn't do it in the code below.

code

Learn about CIFAR-10.

`realness_gan.py`


import numpy
import torch
import torchvision

#Function to calculate KL divergence
#epsilon is put in log so that NaN does not come out
def kl_divergence(p, q, epsilon=1e-16):
    return torch.mean(torch.sum(p * torch.log((p + epsilon) / (q + epsilon)), dim=1))

# torch.nn.Now you can put reshape in Sequential
class Reshape(torch.nn.Module):
    def __init__(self, *shape):
        super(Reshape, self).__init__()
        self.shape = shape

    def forward(self, x):
        return x.reshape(*self.shape)

class GAN:
    def __init__(self):
        self.noise_dimension = 100
        self.n_outcomes      = 20
        self.device          = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

        self.discriminator = torch.nn.Sequential(
            torch.nn.Conv2d( 3, 32, 3, padding=1),
            torch.nn.ReLU(),
            torch.nn.AvgPool2d(2),
            torch.nn.Conv2d(32, 32, 3, padding=1),
            torch.nn.ReLU(),
            torch.nn.AvgPool2d(2),
            torch.nn.Conv2d(32, 32, 3, padding=1),
            torch.nn.ReLU(),
            torch.nn.AvgPool2d(2),
            torch.nn.Conv2d(32, 32, 3, padding=1),
            torch.nn.ReLU(),
            Reshape(-1, 32 * 4 * 4),
            torch.nn.Linear(32 * 4 * 4, self.n_outcomes),
        ).to(self.device)
        self.generator = torch.nn.Sequential(
            torch.nn.Linear(self.noise_dimension, 32 * 4 * 4),
            Reshape(-1, 32, 4, 4),
            torch.nn.Conv2d(32, 32, 3, padding=1),
            torch.nn.ReLU(),
            torch.nn.Upsample(scale_factor=2),
            torch.nn.Conv2d(32, 32, 3, padding=1),
            torch.nn.ReLU(),
            torch.nn.Upsample(scale_factor=2),
            torch.nn.Conv2d(32, 32, 3, padding=1),
            torch.nn.ReLU(),
            torch.nn.Upsample(scale_factor=2),
            torch.nn.Conv2d(32,  3, 3, padding=1),
            torch.nn.Sigmoid(),
        ).to(self.device)

        self.discriminator_optimizer = torch.optim.Adam(self.discriminator.parameters(),
                                                        lr=0.0001,
                                                        betas=[0.0, 0.9])
        self.generator_optimizer     = torch.optim.Adam(self.generator.parameters(),
                                                        lr=0.0001,
                                                        betas=[0.0, 0.9])

        #Calculate Anchor here
        #Take a random number histogram following the author's implementation on Github
        normal = numpy.random.normal(1, 1, 1000) #average+1, normal distribution with standard deviation 1
        count, _ = numpy.histogram(normal, self.n_outcomes, (-2, 2)) # -From 2+Take a histogram up to 2
        self.real_anchor = count / sum(count) #Normalized so that the sum is 1

        normal = numpy.random.normal(-1, 1, 1000) #average-1, normal distribution with standard deviation 1
        count, _ = numpy.histogram(normal, self.n_outcomes, (-2, 2))
        self.fake_anchor = count / sum(count)

        self.real_anchor = torch.Tensor(self.real_anchor).to(self.device)
        self.fake_anchor = torch.Tensor(self.fake_anchor).to(self.device)

    def generate_fakes(self, num):
        mean = torch.zeros(num, self.noise_dimension, device=self.device)
        std  = torch.ones(num, self.noise_dimension, device=self.device)
        noise = torch.normal(mean, std)
        return self.generator(noise)

    def train_discriminator(self, real):
        batch_size = real.shape[0]
        fake = self.generate_fakes(batch_size).detach()

        #Softmax the output of Discriminator to make it a probability
        real_feature = torch.nn.functional.softmax(self.discriminator(real), dim=1)
        fake_feature = torch.nn.functional.softmax(self.discriminator(fake), dim=1)

        loss = kl_divergence(self.real_anchor, real_feature) + kl_divergence(self.fake_anchor, fake_feature) #Thesis formula(3)

        self.discriminator_optimizer.zero_grad()
        loss.backward()
        self.discriminator_optimizer.step()
        
        return float(loss)

    def train_generator(self, real):
        batch_size = real.shape[0]
        fake = self.generate_fakes(batch_size)

        real_feature = torch.nn.functional.softmax(self.discriminator(real), dim=1)
        fake_feature = torch.nn.functional.softmax(self.discriminator(fake), dim=1)

        # loss = -kl_divergence(self.fake_anchor, fake_feature) #Thesis formula(18)
        loss = kl_divergence(real_feature, fake_feature) - kl_divergence(self.fake_anchor, fake_feature) #Thesis formula(19)
        # loss = kl_divergence(self.real_anchor, fake_feature) - kl_divergence(self.fake_anchor, fake_feature) #Thesis formula(20)
        
        self.generator_optimizer.zero_grad()
        loss.backward()
        self.generator_optimizer.step()
        
        return float(loss)

    def step(self, real):
        real = real.to(self.device)

        discriminator_loss = self.train_discriminator(real)
        generator_loss     = self.train_generator(real)

        return discriminator_loss, generator_loss

if __name__ == '__main__':
    transformer = torchvision.transforms.Compose([
        torchvision.transforms.RandomHorizontalFlip(),
        torchvision.transforms.ToTensor(),
    ])

    dataset = torchvision.datasets.CIFAR10(root='C:/datasets',
                                           transform=transformer,
                                           download=True)

    iterator = torch.utils.data.DataLoader(dataset,
                                           batch_size=128,
                                           drop_last=True)

    gan = GAN()
    n_steps = 0

    for epoch in range(1000):
        for iteration, data in enumerate(iterator):
            real = data[0].float()
            discriminator_loss, generator_loss = gan.step(real)
            
            print('epoch : {}, iteration : {}, discriminator_loss : {}, generator_loss : {}'.format(
                epoch, iteration, discriminator_loss, generator_loss
            ))

            n_steps += 1

            if iteration == 0:
                fakes = gan.generate_fakes(64)
                torchvision.utils.save_image(fakes, 'out/{}.png'.format(n_steps))

result

0 epoch (1st step)

10th epoch (3901st step)

100th epoch (39001th step)

500th epoch (195001 step)

With this implementation, both Batch Normalization and Spectral Normalization are [Feature Resampling](#Feature Resampling). ) Is not used either, but it seems that it can be generated reasonably well.

[PYTHON] I tried to implement Realness GAN