[PYTHON] I tried to let AI judge the right wife of the bride who is divided into five equal parts

Introduction

download.jpg

It is no exaggeration to say that the quality of romantic comedy is determined by the ending. And there are a lot of different endings for romantic comedies. Harlem end that does not choose a specific person. Multi-end that prepares the ending for each heroine. An individual end that is fulfilled only with a specific person. It's a romantic comedy where fans always argue with each ending, but it's probably the individual end that is the most confusing.

One of the most confusing romantic comedies about the ending is, yes, The Quintessential Quintessential Bride </ b>.

The ending of this work is said to be the Yotsuba end. However, delusion of an if end with another heroine must be a kind of romantic comedy.

In this article, after learning the images of the first period of animation, AI is made to judge the right wife, and I think a little about the possibility of ending that may have been with other heroines.

What you did

The woman who appeared in the wedding ceremony of the first period of the animation was regarded as a regular wife, and the regular wife was judged by the multi-class classification of the quintuplets and other images. Keras is used as the machine learning framework. In addition, all the learning images shall be those of the first period of animation, and the images to be judged shall be as follows.

0_5.png

By the way, I'm pushing Sanku, so I hope I'll do my best and choose Sanku.

environment

Python : 3.9.0 conda version : 4.9.1 CPU : Intel(R) Core(TM)i5-6500 GPU : Intel(R) HD Graphics 530 keras : 2.3.1

Implementation procedure

  1. Image collection
  2. Face image extraction
  3. Labeling
  4. Data classification
  5. Data padding li>
  6. Model learning
  7. Positive wife judgment

    1. Image collection

    First is the collection of images. It seems that you can use opencv to automatically capture the image by specifying the frame from the video, but since there was no saved video, I decided to capture the animation this time. As a method, I created a program that captures every 5 seconds while dripping one period of animation. Select pyautogui as the module. This module is very useful because it automates various other GUI operations.

    capture.py

    
    
    import os
    import pyautogui
    import time
    
    start = time.time()
    
    for l in range(1,13):
        for i in range(275):
            im = pyautogui.screenshot('./capture_data/' + str(l) +'_'+ str(i) + '.png', region=(1050,50,800,450))
            time.sleep(5)
    
    end = time.time()
    print('result time is :', end - start)
    
    

    In my case, the animation was flowing on the upper right screen of the desktop divided into four, so it is a program that captures region = (1050,50,800,450) and the upper right of the desktop. A total of 3300 </ b> images were captured over a period of about 5 hours. The captured image looks like this.

    capture01.png

    I still can't forget the order of yakiniku set meal without yakiniku. By the way, when capturing from a video by specifying a frame, the following will be helpful.

    When capturing from video

    2. Extraction of face image

    Next, the face image is extracted from the image captured by opencv. As a cascade classifier for face images, we used here, which is famous for anime images. Copy this xml file to the working directory, detect the face from the previous captured image, and then extract it. Also, for the convenience of using VGG16 for the deep learning model, the number of pixels is set to 64 x 64 pixels.

    face_cut.py

    
    
    import cv2
    
    def face_cut(img_path, save_path):
        img = cv2.imread(img_path)
        cascade = cv2.CascadeClassifier('lbpcascade_animeface.xml')![face_cut01.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/208060/e1ce4d3c-113c-f329-ee3a-71a47e7a462c.png)
    
        facerect = cascade.detectMultiScale(img)
        for i, (x,y,w,h) in enumerate(facerect):
            face_img = img[y:y+h, x:x+w]
            face_img = cv2.resize(face_img, (64, 64))
            cv2.imwrite(save_path, face_img)
    
    for l in range(1,13):
        for i in range(275):
            face_cut('capture_data/'+str(l)+'_'+str(i)+'.png', 'cut_data/'+str(l)+'_'+str(i)+'.png')
    
    
    

    The extracted images are below. It seems that even the aunt of the school cafeteria has been extracted for the time being. Of the 3300 captured images, 1365 </ b> were face-detected images. In other words, it was possible to extract the face with just over 1/3 of the total.

    face_cut01.png

    3. Labeling

    Here, 1365 sheets are manually sorted into each heroine's directory. It didn't take much time if it was about 1365, but when the number of images is on the order of 10,000, it seems unlikely.

    tesagyou.py

    
    
    #Do your best! !! !!
    
    

    The sorting results are shown in the table below.

    Classification Number of sheets
    Ichihana 206
    Nino 168
    Sanku 152
    Yotsuba 172
    May 204
    Other 463

    Ichihana was in the lead with 206 sheets, and May was chasing after a close margin. There is a difference of 54 between the maximum number of flowers and the minimum number of three flowers, and it may be necessary to make the number of learning sheets the same if strictness is to be expected, but this time it does not matter so strictness, so continue as it is I will do it.

    4. Data classification

    Here, the previous face image is converted to pandas and labeled with 0 to 5. The ratio of the number of trains and tests was 8: 2.

    split.py

    
    
    # split.py
    
    import numpy as np
    import glob
    import cv2
    from keras.utils.np_utils import to_categorical
    import pandas as pd
    import matplotlib.pyplot as plt
    
    names = ['other', 'ichika', 'nino', 'miku', 'yotsuba', 'itsuki']
    img_list = []
    label_list = []
    
    # append index
    for index, name in enumerate(names):
        face_img = glob.glob('data/'+name+'/*.png')
        for face in face_img:
            # imread RGB
            a = cv2.imread(face, 1)
            b = np.expand_dims(a, axis=0)
            img_list.append(b)
            label_list.append(index)
    
    # convert pandas
    X_pd = pd.Series(img_list)
    y_pd = pd.Series(label_list)
    
    # merge
    Xy_pd = pd.concat([X_pd, y_pd], axis=1)
    # shuffle
    sf_Xy = Xy_pd.sample(frac=1)
    #Reacquire as list after shuffle
    img_list = sf_Xy[0].values
    label_list = sf_Xy[1].values
    #Tuple and combine
    X = np.r_[tuple(img_list)]
    # convert binary
    Y = to_categorical(label_list)
    
    train_rate = 0.8
    
    train_n = int(len(X) * train_rate)
    train_X = X[:train_n]
    test_X = X[train_n:]
    train_y = Y[:train_n][:]
    test_y = Y[train_n:][:]
    
    

    5. Inflated data

    Next, since I was not happy if the number of learning sheets was over 1000, I inflated only the train images. As inflated items, left-right inversion, blurring, and γ conversion were performed, and the train image was inflated 2 ** 3 times. With this, the total number is about 10,000.

    • For convenience, the codes are separated, but the codes 4 and 5 are actually one file.

    split.py

    
    
    ## define scratch_functions
    
    #Flip horizontal
    def flip(img):
        flip_img = cv2.flip(img, 1)
        return flip_img
    
    #Blur
    def blur(img):
        blur_img = cv2.GaussianBlur(img, (5,5), 0)
        return blur_img
    
    #γ conversion
    def gamma(img):
        gamma = 0.75
        LUT_G = np.arange(256, dtype = 'uint8')
        for i in range(256):
            LUT_G[i] = 255 * pow(float(i) / 255, 1.0 / gamma)
        gamma_img = cv2.LUT(img, LUT_G)
        return gamma_img
    
    total_img = []
    for x in train_X:
        imgs = [x]
        # concat list
        imgs.extend(list(map(flip, imgs)))
        imgs.extend(list(map(blur, imgs)))
        imgs.extend(list(map(gamma, imgs)))
        total_img.extend(imgs)
    
    # add dims to total_img
    img_expand = list(map(lambda x:np.expand_dims(x, axis=0), total_img))
    #Tuple and combine
    train_X_scratch = np.r_[tuple(img_expand)]
    
    labels = []
    for label in range(len(train_y)):
        lbl = []
        for i in range(2**3):
            lbl.append(train_y[label, :])
        labels.extend(lbl)
    
    label_expand = list(map(lambda x:np.expand_dims(x, axis=0), labels))
    train_y_scratch = np.r_[tuple(label_expand)]
    
    

    6. Model learning

    Finally, the model is trained using the prepared image. The model for deep learning has no particular meaning, but VGG16 was selected. To be honest, it was surprising that it took about half a day to learn because the number of epochs was set to 100 and the GPU was so confused.

    model.py

    
    
    from keras.applications import VGG16
    from keras.models import Model, Sequential
    from keras.layers import Dense, Activation, Flatten, Input, Dropout
    from keras import optimizers
    import matplotlib.pyplot as plt
    from split import *
    
    # define input_tensor
    input_tensor = Input(shape=(64,64,3))
    vgg16 = VGG16(include_top=False, weights='imagenet', input_tensor=input_tensor)
    
    top_model = Sequential()
    top_model.add(Flatten(input_shape=vgg16.output_shape[1:]))
    top_model.add(Dense(64, activation='sigmoid'))
    top_model.add(Dropout(0.5))
    top_model.add(Dense(32, activation='sigmoid'))
    top_model.add(Dropout(0.5))
    top_model.add(Dense(6, activation='softmax'))
    
    model = Model(inputs=vgg16.input, outputs=top_model(vgg16.output))
    
    # vgg_model apply to 15layers
    for layer in model.layers[:15]:
        layer.trainable = False
    
    # compile
    model.compile(loss='categorical_crossentropy', optimizer=optimizers.SGD(lr=1e-4, momentum=0.9), metrics=['accuracy'])
    history = model.fit(train_X_scratch, train_y_scratch, epochs=100, batch_size=32, validation_data=(test_X, test_y))
    score = model.evaluate(test_X, test_y, verbose=0)
    print(score)
    
    # save model
    model.save('my_model.h5')
    
    # plot acc, val_acc
    plt.plot(history.history['acc'], label='acc', ls='-')
    plt.plot(history.history['val_acc'], label='val_acc', ls='-')
    plt.ylabel('accuracy')
    plt.xlabel('epoch')
    plt.legend(loc='best')
    plt.show()
    
    

    The accuracy is not very good, but a model to classify the five heroines has been completed.

    pic20.jpg

    7. Judgment of positive wife

    Now, it's finally the long-awaited judgment of the right wife! (It took about a day to get here)

    Who was the positive wife judged by AI? !! !!












    pic8.png

    ――The one chosen was Ichihana </ b>.

    No, wasn't it the length of the hair? In terms of color, I thought there would be one chan in May as well. For Sanku, Sanku Hana ~, should I have synthesized headphones that look like audio-visual equipment?

    Well, in the first period of animation, there were almost no scenes where heroines other than Ichihana raised their hair, so it may be appropriate that Ichihana was chosen. Eyes are more important than hair as a characteristic that determines the human face, but in the case of quintuplets, they are all bluish colors, so they were indistinguishable. As a result, I feel that Ichihana was chosen because of her short hair. The hair color is like May, though.

    bonus

    Since it's a big deal, I tried to classify other images as well.

    pic11.png It is a scene of the oath of the previous scene. This was also classified as one flower. After all it looks like a shortcut.

    pic26.jpg

    Well, this is also one flower ~~

    And finally, the last scene of 8 episodes. This is a picture of a girl Kazetaro fell in love with a long time ago.

    pic27.png

    e! This is also a flower! ?? ?? This time, my hair color was like one flower, but I think Ichihana is a little too strong ...

    Conclusion

    So, in terms of AI, the heroine who came out as a bride at the wedding and the girl who used to like it are generally Ichihana-san </ b>. That's it. Something unpleasant older sister attribute heroine </ del> A person who pushes Ichihana says, "No, you see, that's because Ichihana is a positive wife in terms of AI ..." You could say that. Maybe.

    reference

    I tried to recognize the animation "K-ON!" With Keras Let's collect images of anime characters from videos with opencv! Lbpcascade_animeface.xml for anime face detection by OpenCV

    Recommended Posts