[PYTHON] [Introduction to StyleGAN] I played with style_mixing "Woman who takes off glasses" ♬

On the second day of StyleGAN, I will explain two of the three methods of image generation with StyleGAN in Reference (1) below, and try various Style_Mixing image generation. A good explanation of StyleGAN can be found in Reference (2), so if you refer to it, the explanation in this article will be easy to understand. 【reference】 ①NVlabs/stylegan ② StyleGAN commentary CVPR2019 reading session @DeNA

What i did

・ First of all, what are the two methods? ・ Try to make a code ・ ** Latent Mixing **; Try Mixing in the latent space $ z $ ・ ** Style Mixing **; Try mixing in the mapped latent space $ w $ ・ ** StyleMixing_2 ; Generate an image by exchanging style attributes in the mapped latent space $ w $ - StyleMixing_3 **; Mix the individual Style attributes of the mapping latent space $ w $ to generate an image.

・ First of all, what are the two methods?

A simple literal translation is as follows.

There are three ways to use a pre-trained generator: $ 1. Use Gs.run () for immediate mode operations where the inputs and outputs are numpy arrays. $ ** I used this technique last time **

# Pick latent vector.
rnd = np.random.RandomState(5)
latents = rnd.randn(1, Gs.input_shape[1])
# Generate image.
fmt = dict(func=tflib.convert_images_to_uint8, nchw_to_nhwc=True)
images = Gs.run(latents, None, truncation_psi=0.7, randomize_noise=True, output_transform=fmt)

The first argument is a batch of latent vectors of shape [num, 512]. The second argument is reserved for class labels (not used in StyleGAN) The remaining keyword arguments are optional and can be used to further modify the operation (see below). The output is a batch of images, the format of which is determined by the output_transform argument. Please refer to Reference ① for the options (truncation_psi = 0.7, randomize_noise = True) referenced below.

2.Use Gs.get_output_for() to incorporate the generator as a part of a larger TensorFlow expression: I will skip this because I will not use it this time. ... Search for $ 3.Gs.components.mapping and Gs.components.synthesis $ to access the individual subnetworks of the generator. Like $ G $, subnetworks are represented as independent instances of $ dnnlib.tflib.Network $. :: ** This time we will use this technique in the generated image mixing **

src_latents = np.stack(np.random.RandomState(seed).randn(Gs.input_shape[1]) for seed in src_seeds)
src_dlatents = Gs.components.mapping.run(src_latents, None) # [seed, layer, component]
src_images = Gs.components.synthesis.run(src_dlatents, randomize_noise=False, **synthesis_kwargs)

・ Try to make a code

Using the above technique, the actual simplest code can be written as:

import os
import pickle
import numpy as np
import PIL.Image
import dnnlib
import dnnlib.tflib as tflib
import config
from PIL import Image, ImageDraw
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf

synthesis_kwargs = dict(output_transform=dict(func=tflib.convert_images_to_uint8, nchw_to_nhwc=True), minibatch_size=8)

def main():
    # Initialize TensorFlow.
    tflib.init_tf()
    fpath = './weight_files/tensorflow/karras2019stylegan-ffhq-1024x1024.pkl'
    with open(fpath, mode='rb') as f:
        _G, _D, Gs = pickle.load(f)
    #Method 1.Gs for immediate mode operations where the inputs and outputs are numpy arrays.Use run ()
    # Pick latent vector.
    rnd = np.random.RandomState(5)
    latents1 = rnd.randn(1, Gs.input_shape[1])
    # Generate image.
    fmt = dict(func=tflib.convert_images_to_uint8, nchw_to_nhwc=True)
    images = Gs.run(latents1, None, truncation_psi=0.7, randomize_noise=True, output_transform=fmt)
    plt.imshow(images.reshape(1024,1024,3))
    plt.pause(1)
    plt.savefig("./results/simple1_.png ")
    plt.close()
    #Method 3.Gs.components.mapping and Gs.components.Search for synthesis to access the individual subnetworks of the generator
    #Like G, the subnetwork is dnnlib.tflib.Represented as an independent instance of the Network.
    src_seeds = [5]
    src_latents = np.stack(np.random.RandomState(seed).randn(Gs.input_shape[1]) for seed in src_seeds)
    src_dlatents = Gs.components.mapping.run(src_latents, None) # [seed, layer, component]
    src_images = Gs.components.synthesis.run(src_dlatents, randomize_noise=False, **synthesis_kwargs)
    plt.imshow(src_images[0].reshape(1024,1024,3))
    plt.pause(1)
    plt.savefig("./results/simple3_.png ")
    plt.close()
    
if __name__ == "__main__":
    main()

With this code, both methods seem to generate the same image, but when I actually tried it, it was a little different as follows.

	Method 1	Method 2
Latent tensor	z=latents1	z=src_latents, w=src_dlatents
size	(1,512)	(1,512), (1,18,512)

These latent tensors correspond to $ z $ and $ w $ in the figure below, respectively. That is, Latent $ z $ is a vector with 512 parameters, and its mapping latent space $ W $ tensor $ w $ has a dimension of (18,512). In other words, there are 18 inputs A to the Synthesis network (see Reference ③), and these are the tensors $ w $ that are the basis of each Style. 【reference】 ③ Try Style-mixing and play with @ StyleGAN's trained model In other words, the explanation of methods 1 and 3 above can be rephrased as follows.

--Method 1. Image is generated from latent vector $ z $ --Method 2. Obtain the tensor $ w $ of the mapping latent space once from the latent vector $ z $, search $ A $ of the corresponding synthesizesys network from there, and generate an image while calculating each as an independent network. Is doing

・ Latent Mixing; Try Mixing in the latent space z

This is the same as we did last time, but reflecting the above, we will implement it in two ways to find the latent vector $ z $. The main code is below.

`simple_method1.py`


def main():
    # Initialize TensorFlow.
    tflib.init_tf()
    fpath = './weight_files/tensorflow/karras2019stylegan-ffhq-1024x1024.pkl'
    with open(fpath, mode='rb') as f:
        _G, _D, Gs = pickle.load(f)

    # Pick latent vector.
    rnd = np.random.RandomState(5) #5
    latents1 = rnd.randn(1, Gs.input_shape[1])
    print(latents1.shape)
    
    # Generate image.
    fmt = dict(func=tflib.convert_images_to_uint8, nchw_to_nhwc=True)
    images = Gs.run(latents1, None, truncation_psi=1, randomize_noise=False, output_transform=fmt)
    # Pick latent vector2
    src_seeds=[6]
    src_latents = np.stack(np.random.RandomState(seed).randn(Gs.input_shape[1]) for seed in src_seeds)
    # Generate image2
    src_dlatents = Gs.components.mapping.run(src_latents, None) # [seed, layer, component]
    src_images = Gs.components.synthesis.run(src_dlatents, randomize_noise=False, **synthesis_kwargs)
    
    for i in range(1,101,4):
        # mixing latent vetor_1-2
        latents = i/100*latents1+(1-i/100)*src_latents[0].reshape(1,512)
        # Generate image for mixing vector by method1.
        fmt = dict(func=tflib.convert_images_to_uint8, nchw_to_nhwc=True)
        images = Gs.run(latents, None, truncation_psi=1, randomize_noise=False, output_transform=fmt)
        # Save image.
        os.makedirs(config.result_dir, exist_ok=True)
        png_filename = os.path.join(config.result_dir, 'example{}.png'.format(i))
        PIL.Image.fromarray(images[0], 'RGB').save(png_filename)

The result is as follows.

Latent z mixing

Here, the output of both was different in "Try to code", but this was due to the parameters of truncation_psi and randomize_noise. Therefore, in order to ensure reproducibility, it is changed to 1 and False respectively. Impressions) I'm scared to see the faces of my two children when I watch this video. .. ..

・ StyleMixing; Try mixing in the mapped latent space w

Now let's do the same as above, but with the mapping latent space $ w $. The main parts of the code are:

`simple_method2.py`


synthesis_kwargs = dict(output_transform=dict(func=tflib.convert_images_to_uint8, nchw_to_nhwc=True), minibatch_size=8)

def main():
    # Initialize TensorFlow.
    tflib.init_tf()
    fpath = './weight_files/tensorflow/karras2019stylegan-ffhq-1024x1024.pkl'
    with open(fpath, mode='rb') as f:
        _G, _D, Gs = pickle.load(f)

    # Pick latent vector.
    rnd = np.random.RandomState(5) #5
    latents1 = rnd.randn(1, Gs.input_shape[1])
    
    # Generate image.
    dlatents1 = Gs.components.mapping.run(latents1, None) # [seed, layer, component]
    images = Gs.components.synthesis.run(dlatents1, randomize_noise=False, **synthesis_kwargs)
    
    src_seeds=[6]
    src_latents = np.stack(np.random.RandomState(seed).randn(Gs.input_shape[1]) for seed in src_seeds)
    src_dlatents = Gs.components.mapping.run(src_latents, None) # [seed, layer, component]
    src_images = Gs.components.synthesis.run(src_dlatents, randomize_noise=False, **synthesis_kwargs)
    
    for i in range(1,101,4):
        dlatents = i/100*dlatents1+(1-i/100)*src_dlatents
        # Generate image.
        images = Gs.components.synthesis.run(dlatents, randomize_noise=False, **synthesis_kwargs)
        # Save image.
        os.makedirs(config.result_dir, exist_ok=True)
        png_filename = os.path.join(config.result_dir, 'example{}.png'.format(i))
        PIL.Image.fromarray(images[0], 'RGB').save(png_filename)

The result is as follows. At first glance, the results are different.

Style mixing in projected space

It is natural that linear interpolation with the input latent vector $ z $ is different from linear interpolation with each of the Style vector $ w $ in its nonlinear (multi-stage MLP) mapping space. As far as Uwan sees, the result seems to be that the linear interpolation of the Style vector in the mapping space is preferable in the sense that the glasses last longer. Now, let's see that this linear interpolation is still coarse in terms of interpolation.

・ StyleMixing_2; Generate an image by exchanging style attributes in the mapped latent space w

This technique is the most famous example of image change in the paper. I will show you the code immediately. This code is based on the code in Reference ③.

Since the structure of the function has been changed, almost the entire function will be posted.

`ordinary_style_mixising.py`


import os
import pickle
import numpy as np
import PIL.Image
import dnnlib
import dnnlib.tflib as tflib
import config
import matplotlib.pyplot as plt

synthesis_kwargs = dict(output_transform=dict(func=tflib.convert_images_to_uint8, nchw_to_nhwc=True), minibatch_size=8)

def load_Gs():
    fpath = './weight_files/tensorflow/karras2019stylegan-ffhq-1024x1024.pkl'
    with open(fpath, mode='rb') as f:
        _G, _D, Gs = pickle.load(f)
    return Gs

def draw_style_mixing_figure(png, Gs, w, h, src_seeds, dst_seeds, style_ranges):
    print(png)
    src_latents = np.stack(np.random.RandomState(seed).randn(Gs.input_shape[1]) for seed in src_seeds)
    src_dlatents = Gs.components.mapping.run(src_latents, None) # [seed, layer, component]

    # Pick latent vector.
    rnd = np.random.RandomState(5) #5
    latents1 = rnd.randn(1, Gs.input_shape[1])
    print(latents1.shape)
    
    # Generate image.
    dlatents1 = Gs.components.mapping.run(latents1, None) # [seed, layer, component]
    images = Gs.components.synthesis.run(dlatents1, randomize_noise=False, **synthesis_kwargs)

    dst_dlatents = np.zeros((6,18,512))
    for j in range(6):
        dst_dlatents[j] = dlatents1

    src_images = Gs.components.synthesis.run(src_dlatents, randomize_noise=False, **synthesis_kwargs)
    dst_images = Gs.components.synthesis.run(dst_dlatents, randomize_noise=False, **synthesis_kwargs)
    print(dst_images.shape)

    canvas = PIL.Image.new('RGB', (w * (len(src_seeds) + 1), h * (len(dst_seeds) + 1)), 'white')
    for col, src_image in enumerate(list(src_images)):
        canvas.paste(PIL.Image.fromarray(src_image, 'RGB'), ((col + 1) * w, 0))
    for row, dst_image in enumerate(list(dst_images)):
        canvas.paste(PIL.Image.fromarray(dst_image, 'RGB'), (0, (row + 1) * h))
        row_dlatents = np.stack([dst_dlatents[row]] * len(src_seeds))
        row_dlatents[:, style_ranges[row]] = src_dlatents[:, style_ranges[row]]
        row_images = Gs.components.synthesis.run(row_dlatents, randomize_noise=False, **synthesis_kwargs)
        for col, image in enumerate(list(row_images)):
            canvas.paste(PIL.Image.fromarray(image, 'RGB'), ((col + 1) * w, (row + 1) * h))
    canvas.save(png)

def main():
    tflib.init_tf()
    os.makedirs(config.result_dir, exist_ok=True)
    draw_style_mixing_figure(os.path.join(config.result_dir, 'style-mixing.png'), 
                             load_Gs(), w=1024, h=1024, src_seeds=[6,701,687,615,2268], dst_seeds=[0,0,0,0,0,0],
                             style_ranges=[range(0,8)]+[range(1,8)]+[range(2,8)]+[range(1,18)]+[range(4,18)]+[range(5,18)])

if __name__ == "__main__":
    main()

The result is as follows.

The above output 1024x1024 has been reduced to 256x256 due to the posted size. From the code, these figures are generated by the following Style conversion. style_ranges=[range(0,8)]+[range(1,8)]+[range(2,8)]+[range(1,18)]+[range(4,18)]+[range(5,18)] row_dlatents[:, style_ranges[row]] = src_dlatents[:, style_ranges[row]] In other words, this much image conversion can be done only by converting the following Styles from the top of Style [0,18].
From the uncle's side, the following Range part has been changed to a female style. It means that you need to maintain at least range [0,4] to maintain your uncle.


	[range(0,8)]
Same as above	[range(1,8)]
Same as above	[range(2,8)]
Same as above	[range(1,18)]
Same as above	[range(4,18)]
Same as above	[range(5,18)]

On the contrary, from the perspective of women, looking at the 2nd and 4th steps, the glasses are removed just because there is no range [0], and the femininity is changed to a rather boyish feeling. Especially in the 4th row, everything is the same except for range [0], but there are considerable changes.

-StyleMixing_3; Mix the individual Style attribute of the mapping latent space w to generate an image.

So, let's see this change by mixing the Style of this range [0] using the code above. This can be achieved by replacing the relevant code part of simple_method2.py with the following.

`individual_mixing_style.py`


    for i in range(1,26,1):
        dlatents=src_dlatents
        dlatents[0][0] = i/100*dlatents1[0][0]+(1-i/100)*src_dlatents[0][0]

Individual style mixing in projected space

Although not shown this time, this method allows you to mix arbitrary parameters in the Style space, resulting in more detailed mixing.

Summary

・ I tried "Woman who takes off glasses" ・ Mixing can now be performed by specializing in each characteristic. ・ With this method, the same can be applied to images given by npy.

・ I want to learn my own image and generate my own Style image