[PYTHON] Convert a real ferocious crocodile image into a 100 crocodile-like smiling neural style with Keras

What is neural style conversion?

Neural style conversion is one of the machine learning techniques that converts a target image into the style (texture) of another image to generate a new image. It's a technology used in apps that change images of towns and people in a Van Gogh style. スクリーンショット 2020-03-24 15.13.40.png Now, using this technology, I would like to try to convert a real, ferocious crocodile image that seems to eat people into a crocodile style that will die in 100 days and convert it into a gentle smiling crocodile. (Even so, is Nii-chan in this image okay ... If you're not careful, you'll die!) wani_plus_wani.png Basically, the content of the original image (macro structure such as the skeleton of the image) is maintained, and then the style (texture) of 100 crocodile-style cartoon touch is adopted. In deep learning, we always aim to achieve our goals by defining a loss function that specifies what we want to achieve and minimizing that loss function. Here is an image of the loss function that I want to minimize, which is very rough in this example.

Loss function = (real crocodile image content-generated image content) + (100 crocodile style-generated image style)

The source code is from this book by Keras creator ↓. It's almost like this book, so if you are interested in the details, please buy it. [Deep Learning with Python and Keras](https://www.amazon.co.jp/Python%E3%81%A8Keras%E3%81%AB%E3%82%88%E3%82%8B%E3%83] % 87% E3% 82% A3% E3% 83% BC% E3% 83% 97% E3% 83% A9% E3% 83% BC% E3% 83% 8B% E3% 83% B3% E3% 82% B0 -Francois-Chollet / dp / 4839964262 / ref = sr_1_1? __mk_ja_JP =% E3% 82% AB% E3% 82% BF% E3% 82% AB% E3% 83% 8A & crid = 1HSLD7YJT37UJ & dchild = 1 & keywords = python + keras & qid = 1585030509 books & sprefix = python + keras% 2Caps% 2C268 & sr = 1-1) Source code

environment

Use Google Colab. No configuration is required and you can access the GPU for free, so you can easily process images. Save the image you want to use in Google Drive and load the image from Google Colab.

Neural style conversion using Keras in Google Colab

First, save the target image and style image you want to process in google drive. After saving, open the notebook with google colab. Therefore, execute ↓ to access google drive, and allow access on the google drive side. Then you will get the authorization code, so enter it in the form that appears after executing the code below.

from google.colab import drive
drive.mount('/content/drive')

Next, define the path of the image. Process the processed images so that they have the same size.

import keras
keras.__version__
from keras.preprocessing.image import load_img, img_to_array

#The path of the target image. Rewrite path to the location you saved.
target_image_path = '/content/drive/My Drive/Colab Notebooks/wani/wani2.png'
#Style image path. Rewrite path to the location you saved.
style_reference_image_path = '/content/drive/My Drive/Colab Notebooks/wani/100wani.png'

#Generated image size
width, height = load_img(target_image_path).size
img_height = 400
img_width = int(width * img_height / height)

Next, create an auxiliary function that reads, preprocesses, and postprocesses the images exchanged with VGG19.

import numpy as np
from keras.applications import vgg19

def preprocess_image(image_path):
    img = load_img(image_path, target_size=(img_height, img_width))
    img = img_to_array(img)
    img = np.expand_dims(img, axis=0)
    img = vgg19.preprocess_input(img)
    return img

def deprocess_image(x):
    x[:, :, 0] += 103.939
    x[:, :, 1] += 116.779
    x[:, :, 2] += 123.68
    x = x[:, :, ::-1]
    x = np.clip(x, 0, 255).astype('uint8')
    return x

Next, define VGG19.

from keras import backend as K

target_image = K.constant(preprocess_image(target_image_path))
style_reference_image = K.constant(preprocess_image(style_reference_image_path))

#Placeholder to hold the generated image
combination_image = K.placeholder((1, img_height, img_width, 3))

#Combine 3 images into one batch
input_tensor = K.concatenate([target_image,
                              style_reference_image,
                              combination_image], axis=0)

#Build VGG19 using a batch of 3 images as input
#This model is loaded with trained ImageNet weights
model = vgg19.VGG19(input_tensor=input_tensor,
                    weights='imagenet',
                    include_top=False)
print('Model loaded.')

Define the loss function.

#Content loss function
def content_loss(base, combination):
    return K.sum(K.square(combination - base))

#Style loss function
def gram_matrix(x):
    features = K.batch_flatten(K.permute_dimensions(x, (2, 0, 1)))
    gram = K.dot(features, K.transpose(features))
    return gram

def style_loss(style, combination):
    S = gram_matrix(style)
    C = gram_matrix(combination)
    channels = 3
    size = img_height * img_width
    return K.sum(K.square(S - C)) / (4. * (channels ** 2) * (size ** 2))

#Total variation loss function
def total_variation_loss(x):
    a = K.square(
        x[:, :img_height - 1, :img_width - 1, :] - x[:, 1:, :img_width - 1, :])
    b = K.square(
        x[:, :img_height - 1, :img_width - 1, :] - x[:, :img_height - 1, 1:, :])
    return K.sum(K.pow(a + b, 1.25))

Define the final loss function (weighted average of these three functions) to be minimized.

outputs_dict = dict([(layer.name, layer.output) for layer in model.layers])
content_layer = 'block5_conv2'
style_layers = ['block1_conv1',
                'block2_conv1',
                'block3_conv1',
                'block4_conv1',
                'block5_conv1']

total_variation_weight = 1e-4
style_weight = 1.
content_weight = 0.025

loss = K.variable(0.)
layer_features = outputs_dict[content_layer]
target_image_features = layer_features[0, :, :, :]
combination_features = layer_features[2, :, :, :]
loss += content_weight * content_loss(target_image_features,
                                      combination_features)
for layer_name in style_layers:
    layer_features = outputs_dict[layer_name]
    style_reference_features = layer_features[1, :, :, :]
    combination_features = layer_features[2, :, :, :]
    sl = style_loss(style_reference_features, combination_features)
    loss += (style_weight / len(style_layers)) * sl
loss += total_variation_weight * total_variation_loss(combination_image)

Define the gradient descent process

grads = K.gradients(loss, combination_image)[0]
fetch_loss_and_grads = K.function([combination_image], [loss, grads])

class Evaluator(object):

    def __init__(self):
        self.loss_value = None
        self.grads_values = None

    def loss(self, x):
        assert self.loss_value is None
        x = x.reshape((1, img_height, img_width, 3))
        outs = fetch_loss_and_grads([x])
        loss_value = outs[0]
        grad_values = outs[1].flatten().astype('float64')
        self.loss_value = loss_value
        self.grad_values = grad_values
        return self.loss_value

    def grads(self, x):
        assert self.loss_value is not None
        grad_values = np.copy(self.grad_values)
        self.loss_value = None
        self.grad_values = None
        return grad_values

evaluator = Evaluator()

Finally it's done!

from scipy.optimize import fmin_l_bfgs_b
#from scipy.misc import imsave
import imageio
import time

result_prefix = 'style_transfer_result'
iterations = 30

# Run scipy-based optimization (L-BFGS) over the pixels of the generated image
# so as to minimize the neural style loss.
# This is our initial state: the target image.
# Note that `scipy.optimize.fmin_l_bfgs_b` can only process flat vectors.
x = preprocess_image(target_image_path)
x = x.flatten()
for i in range(iterations):
    print('Start of iteration', i)
    start_time = time.time()
    x, min_val, info = fmin_l_bfgs_b(evaluator.loss, x,
                                     fprime=evaluator.grads, maxfun=20)
    print('Current loss value:', min_val)
    # Save current generated image
    img = x.copy().reshape((img_height, img_width, 3))
    img = deprocess_image(img)
    fname = result_prefix + '_at_iteration_%d.png' % i
    #imsave(fname, img)
    imageio.imwrite(fname, img)
    end_time = time.time()
    print('Image saved as', fname)
    print('Iteration %d completed in %ds' % (i, end_time - start_time))

Output image

from scipy.optimize import fmin_l_bfgs_b
from matplotlib import pyplot as plt

#Content image
plt.imshow(load_img(target_image_path, target_size=(img_height, img_width)))
plt.figure()

#Style image
plt.imshow(load_img(style_reference_image_path, target_size=(img_height, img_width)))
plt.figure()

#Generated image
plt.imshow(img)
plt.show()

Output result

The output result is ,,,,,,,, wani_result.png It's different from what I had imagined! !! !! !! !! No pop and gentle crocodile feeling at all! !! !! !! !! !! Well, there is deep learning, but for the time being, I think you could have experienced that you can try deep learning by using Google Colab or Keras. You can try various image processing by yourself with this code, so please try it.

Really Keras is amazing. Again, this code is [Deep Learning with Python and Keras](https://www.amazon.co.jp/Python%E3%81%A8Keras%E3%81%AB%E3%82%88] % E3% 82% 8B% E3% 83% 87% E3% 82% A3% E3% 83% BC% E3% 83% 97% E3% 83% A9% E3% 83% BC% E3% 83% 8B% E3 % 83% B3% E3% 82% B0-Francois-Chollet / dp / 4839964262 / ref = sr_1_1? __mk_ja_JP =% E3% 82% AB% E3% 82% BF% E3% 82% AB% E3% 83% 8A & crid = 1HSLD7YJT37UJ & dchild = 1 & keywords = python + keras & qid = 1558530509 & s = books & sprefix = python + keras% 2Caps% 2C268 & sr = 1-1) ↓ This code is used. Source code

Recommended Posts

Convert a real ferocious crocodile image into a 100 crocodile-like smiling neural style with Keras
Image recognition with keras
Image classification with self-made neural network by Keras and PyTorch