[PYTHON] Attempt to classify emoji fonts with Keras

Manga artist Masayuki Kitamichi has produced and released the font "Kitamiji 222" to commemorate "Cat Day" on February 22nd.

http://kitamichi.sub.jp/Sites/iblog/C513573485/E937677024/ (Introduction article: http://www.forest.impress.co.jp/docs/review/20160303_746474.html)

First of all, check the license.

◆ About use

"Kitamiji 222" is freeware. If it is not for commercial use, you can use it freely without any restrictions. However, please refrain from selling, redistributing, or processing the "font file body".

The copyright of "Kitamiji 222" belongs to Masayuki Kitamichi. We are not responsible for any machine or other troubles caused by using fonts.

(This time, the font image was cut out and used as machine learning data, but I think that it does not correspond to the processing of the "font file body". → Mr. Kitamichi agreed to use this font.)

For example, "aiueo" and "kakikukeko" are displayed as follows. AIUEO

** Kakikukeko **

In this way, the difference in vowels is expressed by the direction of the cat's face, and the difference in consonants is expressed by the difference in the coat pattern. Intuitively, I felt that the recognition of the vowel "aiueo" could be done relatively easily with a classifier composed of neural networks, so I tried this. ("A" and "e" ("ka" and "ke") look a little similar, but it seems a little difficult to classify.)

Since "Kitamiji 222" is provided in TrueType font, the work was done as follows.

Process up to the point of conversion to Numpy data with "Jupyter (ipython) Notebook" + python 3 kernel.
"Keras" + "TensorFlow" backend (by python 2) where Numpy data is read and classified.

(Programming environment: IPython-notebook 4.0.4, python 3.5.1, numpy 1.10.4, pillow 3.1.1, python 2.7.11, keras 0.3.0, tensorflow 0.7.0)

Cut out font image and save pickle

Since I had little experience dealing with 2-byte fonts and font images, this pre-process work, including research, took a considerable amount of time. The main work was done with Pillow (PIL Fork)'s Image class library.

First, import the library and load the font.

import numpy as np
import matplotlib.pyplot as plt
from PIL import ImageFont, ImageDraw, Image
%matplotlib inline

font = './kitamiji222_ver101.ttf'
font = ImageFont.truetype(font, 36)

Next, create an instance of the required image and write the text there.

text = u'AIUEO'
siz = font.getsize(text)
img1 = Image.new('RGB', siz, (255, 255, 255))
draw = ImageDraw.Draw(img1)
orig = (0, 0)
draw.text(orig, text, (0, 0, 0), font=font)

Display this and confirm.

plt.imshow(img1)
plt.xticks([])
plt.yticks([])
plt.show()

Since the font does not need color information, it is grayscaled. Then convert to a Numpy matrix.

images = []
siz = (36, 36)
for hira_ch in hiralist:
    img_ch = Image.new('RGB', siz, (255, 255, 255))
    draw = ImageDraw.Draw(img_ch)
    orig = (0, 0)
    draw.text(orig, hira_ch, (0, 0, 0), font=font)
    img_ch_g = img_ch.convert('L')
    images.append(img_ch_g)

def PIL2npmat(img):
    return np.array(img.getdata(), np.uint8).reshape(img.size[1], img.size[0])

imgmats = [PIL2npmat(img) for img in images]

Finally, save it in a file in pickle format.

import pickle

mydata = [imgmats, codelist]
filename = 'kitamiji222.pkl'
outputfp = open(filename, 'wb')
pickle.dump(mydata, outputfp, protocol=2)
outputfp.close()

In the list above, mydata is the python object you want to save. This time, the procedure is to save the pickle in the python3 environment and load it in the python2 environment, but in order to maintain the compatibility of the pickle file, protocol = 2 is specified inpickle.dump (). ..

Trial with MLP (Multi-layer Perceptron) model

As mentioned above, the Deep Learning Framework "Keras" was used, but the network model was defined as follows with reference to this sample code "mnist_mlp.py".

    trXshape = trainX[0].shape
    nclass = trainY.shape[1]
    hidden_units = 800
    
    model = Sequential()                                    #Instantiation of Sequential model
    model.add(Dense(hidden_units, input_shape=trXshape))    #Definition of hidden layer
    model.add(Activation('relu'))
    model.add(Dropout(0.3))
    model.add(Dense(hidden_units))                          #Definition of hidden layer
    model.add(Activation('relu'))
    model.add(Dropout(0.5))
    model.add(Dense(nclass))                                #Output layer definition
    model.add(Activation('softmax'))

    optimizer = Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08) #Optimizer definition
    model.compile(loss='categorical_crossentropy', optimizer=optimizer) #Select cost and compile model

Since there are five types of vowels classified this time, the label data trainY contains information corresponding to ['a','i','u','e','o'] calculated from the font code added in advance. Is included. (Therefore, nclass = 5.)

The model uses Keras Sequential type and is defined in order from the input side to the output side. From the input side, a total of three MLP models, hidden layer 1 → hidden layer 2 → output layer, were used. Since the font image is 36x36 (= 1296), the number of hidden layer units is set to 800. Dropout () is used for Regularization. Adam () was used as the optimizer for learning.

In recent Keras, you can select "Theano" or "TensorFlow" as the backend, but this time I used "TensorFlow". You can specify the Keras bankend by setting an environment variable.

export KERAS_BACKEND=tensorflow

First, the required number of training data (Train data) and test data (Test data) were randomly sampled from "Hiragana" and calculated. The calculation status, cost and accuracy are shown in the figure below.

Fig. MLP model, Loss and Accuracy

After completing the predetermined number of epochs, Loss converges to near 0 and the accuracy converges to near 1.0. In addition, the accuracy was almost 1.0 (100%) in the subsequent classification using Test data. Even though it was randomly sampled, there is no difference in accuracy between Train and Test because Train Data and Test Data are taken from the same set and population.

Test "Katakana" with MLP model

Since it is not interesting just to learn, I decided to prepare test data separately from the learning data. Fortunately, hiragana and katakana were prepared for "Kitamiji 222", so we decided to use a set of hiragana as learning data and a set of katakana as test data.

Let's compare each image. ** Fig. Hiragana, from line to line **

** Fig. Katakana, from line A to line **

It is observed that the characters with the same sound in hiragana and katakana are quite similar in shape. It was found that the difference between hiragana and katakana is expressed by whether the mouth is "closed" or "open".

Therefore, Train data and Test data were prepared separately and the calculation was performed.

Fig. MLP model, Loss and Accuracy

Fig. MLP model, Validation Loss and Validation Accuracy

The situation of learning using hiragana (Train data) is almost the same as the previous figure. As for Validation Loss and Accuracy using katakana (Test data), Loss decreases and Accuracy increases as intended. The final accuracy was 75%.

I changed the calculation parameters to improve the accuracy, but the most effective one is the dropout value. By adjusting the goodness of fit of hiragana to emoticons, the accuracy of katakana classification can be improved. I was able to realize what I learned in the textbook in the shape of emoticons. (By the way, the good results of various trials are shown in the above list, and the dropout rate in the first stage is 0.3 and the dropout rate in the second stage is 0.5.

Tested on CNN (Convolitional Neural Network)

I tried the CNN (Convolutional Neural Network) model for better accuracy. The nice thing about high-level class libraries like Keras is that you can easily change the model with a few code changes. The main parts of the code are shown below.

    nb_classes = trainY.shape[1]
    img_rows, img_cols = 36, 36     # image dimensions
    trainX = trainX.reshape(trainX.shape[0], 1, img_rows, img_cols)
    testX = testX.reshape(testX.shape[0], 1, img_rows, img_cols)
    nb_filters = 32                 # convolutional filters to use
    nb_pool = 2                     # size of pooling area for max pooling
    nb_conv = 3                     # convolution kernel size
    
    model = Sequential()                                     #Instantiation of Sequential model
    model.add(Convolution2D(nb_filters, nb_conv, nb_conv,    #Definition of convolution layer
                        border_mode='valid',
                        input_shape=(1, img_rows, img_cols)))
    model.add(Activation('relu'))
    model.add(Convolution2D(nb_filters, nb_conv, nb_conv))   #Definition of convolution layer
    model.add(Activation('relu'))
    model.add(MaxPooling2D(pool_size=(nb_pool, nb_pool)))    #Definition of pooling layer
    model.add(Dropout(0.3))

    model.add(Flatten())
    model.add(Dense(128))                                    #Definition of full bond layer
    model.add(Activation('relu'))
    model.add(Dropout(0.5))
    model.add(Dense(nb_classes))                             #Output layer definition
    model.add(Activation('softmax'))

    optimizer = Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08)  #Optimizer definition
    model.compile(loss='categorical_crossentropy', optimizer=optimizer)  #Select cost and compile model

From the input side, it has a five-layer structure of convolution layer 1 → convolution layer 2 → pooling layer → fully connected layer → output layer. (As you can imagine, this layer structure follows the sample mnist_cnn.py.) Also, the optimizer uses Adam as before.

The calculation result is as follows.

Fig. CNN model, Loss and Accuracy

Fig. CNN model, Validation Loss and Validation Accuracy

As intended, the classification accuracy is improved. The final classification accuracy of Test data (Katakana) was 89%. I tried my best by adjusting the parameters, but I couldn't get it on the 90% level. (I checked mainly the dropout rate.)

Compared to the accuracy of 98 .. 99% in the handwritten digit classification task "MNIST", 89% is a little disappointing result, but the number of data set samples is decisively different between MNIST and this time. (In this case, sampling is done randomly, but the type is limited to the font set.) To further improve the accuracy of this pictogram classification, font image processing (deformation, noise addition) Etc., it is necessary to have a hard time increasing the number of data samples. (Alternatively, changing the method of regularization may be somewhat effective.)

Finally, I would like to thank Mr. Kitamichi for publishing an interesting subject of machine learning. (I know that the font wasn't created with machine learning in mind, but I was able to play it !!!)

References (web site)

-Cat pictogram font "Kitamiji 222" produced by the creator of the 4-frame manga "Pu-Neko" http://www.forest.impress.co.jp/docs/review/20160303_746474.html

Pillow (PIL Fork) documentation https://pillow.readthedocs.org/en/3.1.x/
Load pickle file(comes from python3) in python2 - stackoverflow http://stackoverflow.com/questions/29587179/load-pickle-filecomes-from-python3-in-python2 --Deep Learning-- Kodansha Machine Learning Professional Series
Keras Documentation http://keras.io/
Keras as wrapper of Theano & TensorFlow - Qiita http://qiita.com/TomokIshii/items/7de052565719add8e8ad