Manga artist Masayuki Kitamichi has produced and released the font "Kitamiji 222" to commemorate "Cat Day" on February 22nd.
http://kitamichi.sub.jp/Sites/iblog/C513573485/E937677024/ (Introduction article: http://www.forest.impress.co.jp/docs/review/20160303_746474.html)
First of all, check the license.
◆ About use
"Kitamiji 222" is freeware. If it is not for commercial use, you can use it freely without any restrictions. However, please refrain from selling, redistributing, or processing the "font file body".
The copyright of "Kitamiji 222" belongs to Masayuki Kitamichi. We are not responsible for any machine or other troubles caused by using fonts.
(This time, the font image was cut out and used as machine learning data, but I think that it does not correspond to the processing of the "font file body". → Mr. Kitamichi agreed to use this font.)
For example, "aiueo" and "kakikukeko" are displayed as follows. AIUEO
** Kakikukeko **
In this way, the difference in vowels is expressed by the direction of the cat's face, and the difference in consonants is expressed by the difference in the coat pattern. Intuitively, I felt that the recognition of the vowel "aiueo" could be done relatively easily with a classifier composed of neural networks, so I tried this. ("A" and "e" ("ka" and "ke") look a little similar, but it seems a little difficult to classify.)
Since "Kitamiji 222" is provided in TrueType font, the work was done as follows.
(Programming environment: IPython-notebook 4.0.4, python 3.5.1, numpy 1.10.4, pillow 3.1.1, python 2.7.11, keras 0.3.0, tensorflow 0.7.0)
Since I had little experience dealing with 2-byte fonts and font images, this pre-process work, including research, took a considerable amount of time. The main work was done with Pillow (PIL Fork)'s Image class library.
First, import the library and load the font.
import numpy as np
import matplotlib.pyplot as plt
from PIL import ImageFont, ImageDraw, Image
%matplotlib inline
font = './kitamiji222_ver101.ttf'
font = ImageFont.truetype(font, 36)
Next, create an instance of the required image and write the text there.
text = u'AIUEO'
siz = font.getsize(text)
img1 = Image.new('RGB', siz, (255, 255, 255))
draw = ImageDraw.Draw(img1)
orig = (0, 0)
draw.text(orig, text, (0, 0, 0), font=font)
Display this and confirm.
plt.imshow(img1)
plt.xticks([])
plt.yticks([])
plt.show()
Since the font does not need color information, it is grayscaled. Then convert to a Numpy matrix.
images = []
siz = (36, 36)
for hira_ch in hiralist:
img_ch = Image.new('RGB', siz, (255, 255, 255))
draw = ImageDraw.Draw(img_ch)
orig = (0, 0)
draw.text(orig, hira_ch, (0, 0, 0), font=font)
img_ch_g = img_ch.convert('L')
images.append(img_ch_g)
def PIL2npmat(img):
return np.array(img.getdata(), np.uint8).reshape(img.size[1], img.size[0])
imgmats = [PIL2npmat(img) for img in images]
Finally, save it in a file in pickle format.
import pickle
mydata = [imgmats, codelist]
filename = 'kitamiji222.pkl'
outputfp = open(filename, 'wb')
pickle.dump(mydata, outputfp, protocol=2)
outputfp.close()
In the list above, mydata is the python object you want to save. This time, the procedure is to save the pickle in the python3 environment and load it in the python2 environment, but in order to maintain the compatibility of the pickle file, protocol = 2
is specified inpickle.dump ()
. ..
As mentioned above, the Deep Learning Framework "Keras" was used, but the network model was defined as follows with reference to this sample code "mnist_mlp.py".
trXshape = trainX[0].shape
nclass = trainY.shape[1]
hidden_units = 800
model = Sequential() #Instantiation of Sequential model
model.add(Dense(hidden_units, input_shape=trXshape)) #Definition of hidden layer
model.add(Activation('relu'))
model.add(Dropout(0.3))
model.add(Dense(hidden_units)) #Definition of hidden layer
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(nclass)) #Output layer definition
model.add(Activation('softmax'))
optimizer = Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08) #Optimizer definition
model.compile(loss='categorical_crossentropy', optimizer=optimizer) #Select cost and compile model
Since there are five types of vowels classified this time, the label data trainY contains information corresponding to ['a','i','u','e','o'] calculated from the font code added in advance. Is included. (Therefore, nclass = 5.)
The model uses Keras Sequential type and is defined in order from the input side to the output side. From the input side, a total of three MLP models, hidden layer 1 → hidden layer 2 → output layer, were used. Since the font image is 36x36 (= 1296), the number of hidden layer units is set to 800. Dropout () is used for Regularization. Adam () was used as the optimizer for learning.
In recent Keras, you can select "Theano" or "TensorFlow" as the backend, but this time I used "TensorFlow". You can specify the Keras bankend by setting an environment variable.
export KERAS_BACKEND=tensorflow
First, the required number of training data (Train data) and test data (Test data) were randomly sampled from "Hiragana" and calculated. The calculation status, cost and accuracy are shown in the figure below.
Fig. MLP model, Loss and Accuracy
After completing the predetermined number of epochs, Loss converges to near 0 and the accuracy converges to near 1.0. In addition, the accuracy was almost 1.0 (100%) in the subsequent classification using Test data. Even though it was randomly sampled, there is no difference in accuracy between Train and Test because Train Data and Test Data are taken from the same set and population.
Since it is not interesting just to learn, I decided to prepare test data separately from the learning data. Fortunately, hiragana and katakana were prepared for "Kitamiji 222", so we decided to use a set of hiragana as learning data and a set of katakana as test data.
Let's compare each image. ** Fig. Hiragana, from line to line **
** Fig. Katakana, from line A to line **
It is observed that the characters with the same sound in hiragana and katakana are quite similar in shape. It was found that the difference between hiragana and katakana is expressed by whether the mouth is "closed" or "open".
Therefore, Train data and Test data were prepared separately and the calculation was performed.
Fig. MLP model, Loss and Accuracy
Fig. MLP model, Validation Loss and Validation Accuracy
The situation of learning using hiragana (Train data) is almost the same as the previous figure. As for Validation Loss and Accuracy using katakana (Test data), Loss decreases and Accuracy increases as intended. The final accuracy was 75%.
I changed the calculation parameters to improve the accuracy, but the most effective one is the dropout value. By adjusting the goodness of fit of hiragana to emoticons, the accuracy of katakana classification can be improved. I was able to realize what I learned in the textbook in the shape of emoticons. (By the way, the good results of various trials are shown in the above list, and the dropout rate in the first stage is 0.3 and the dropout rate in the second stage is 0.5.
I tried the CNN (Convolutional Neural Network) model for better accuracy. The nice thing about high-level class libraries like Keras is that you can easily change the model with a few code changes. The main parts of the code are shown below.
nb_classes = trainY.shape[1]
img_rows, img_cols = 36, 36 # image dimensions
trainX = trainX.reshape(trainX.shape[0], 1, img_rows, img_cols)
testX = testX.reshape(testX.shape[0], 1, img_rows, img_cols)
nb_filters = 32 # convolutional filters to use
nb_pool = 2 # size of pooling area for max pooling
nb_conv = 3 # convolution kernel size
model = Sequential() #Instantiation of Sequential model
model.add(Convolution2D(nb_filters, nb_conv, nb_conv, #Definition of convolution layer
border_mode='valid',
input_shape=(1, img_rows, img_cols)))
model.add(Activation('relu'))
model.add(Convolution2D(nb_filters, nb_conv, nb_conv)) #Definition of convolution layer
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(nb_pool, nb_pool))) #Definition of pooling layer
model.add(Dropout(0.3))
model.add(Flatten())
model.add(Dense(128)) #Definition of full bond layer
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(nb_classes)) #Output layer definition
model.add(Activation('softmax'))
optimizer = Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08) #Optimizer definition
model.compile(loss='categorical_crossentropy', optimizer=optimizer) #Select cost and compile model
From the input side, it has a five-layer structure of convolution layer 1 → convolution layer 2 → pooling layer → fully connected layer → output layer. (As you can imagine, this layer structure follows the sample mnist_cnn.py.) Also, the optimizer uses Adam as before.
The calculation result is as follows.
Fig. CNN model, Loss and Accuracy
Fig. CNN model, Validation Loss and Validation Accuracy
As intended, the classification accuracy is improved. The final classification accuracy of Test data (Katakana) was 89%. I tried my best by adjusting the parameters, but I couldn't get it on the 90% level. (I checked mainly the dropout rate.)
Compared to the accuracy of 98 .. 99% in the handwritten digit classification task "MNIST", 89% is a little disappointing result, but the number of data set samples is decisively different between MNIST and this time. (In this case, sampling is done randomly, but the type is limited to the font set.) To further improve the accuracy of this pictogram classification, font image processing (deformation, noise addition) Etc., it is necessary to have a hard time increasing the number of data samples. (Alternatively, changing the method of regularization may be somewhat effective.)
Finally, I would like to thank Mr. Kitamichi for publishing an interesting subject of machine learning. (I know that the font wasn't created with machine learning in mind, but I was able to play it !!!)
-Cat pictogram font "Kitamiji 222" produced by the creator of the 4-frame manga "Pu-Neko" http://www.forest.impress.co.jp/docs/review/20160303_746474.html
Recommended Posts