[PYTHON] I tried handwriting recognition of runes with CNN using Keras

Introduction

This article is planned by Fujitsu Systems Web Technology Limited Inobeko Summer Vacation Advent Calendar 2020 Day 33 !! (Overtime) article. The content of this article is my own opinion and does not represent the organization to which I belong.

What i did

In Our previous advent Articles I posted in, I touched skit-learn to implement "rune" handwriting recognition. I was able to create it for the time being, but when I studied Deep Learning from that point on, "** For image recognition, not a basic neural network There is a better way to use a convolutional neural network (CNN)! **"When Now that we know that, we'll share what we know about its implementation and how it works.

Why are they runes in the first place?

Runes are cool, aren't they? images.png

What was done last time and problems

What I did last time

Using the Python machine learning library ** scikit-learn **, We classified handwritten runes using a model called MLP Classifier that performs "classification". For the data, I prepared an image of handwritten characters by myself, increased it with "Data Augmentation", and used it for learning.

Results and challenges

As a result, we were able to create a model that can recognize handwritten characters with an accuracy of about 80%. However, as shown below, I was interested in the fact that the image data was arranged in a one-dimensional array for learning.

--Processing of the part that reads data


#Read the file in the directory and add it to the list of training data
for i, file in enumerate(files):
    image = Image.open(file)
    image = image.convert("L")
    image = image.resize((image_size, image_size))
    data = np.asarray(image).flatten() ##★ Here, the pixel information is arranged in a one-dimensional array.
    X.append(data)

#View data
np.set_printoptions(threshold=np.inf)
print(np.array(image2).flatten())

--Read handwritten character image

image.png

Data output result


As mentioned above, in a one-dimensional array The pixel information from the upper left to the lower right of the image should be in the flat. If this is the case, the vertical and horizontal information of the image will be lost ...

Use of Convolutional Neural Network (CNN)

On the other hand, if you use a convolutional neural network (CNN) I learned that you can learn with data that holds vertical and horizontal information! Below, I will describe the outline and implementation method that I have organized.

Overview

Convolutional Neural Network (CNN) A neural network used in the field of image processing ** The image can be used as it is for input in two dimensions **.

The main story of CNN is "Let's imitate the movement of nerve cells in the visual cortex of humans." The method that was created.

CNN uses a filter (kernel) to extract features from an image. Use a filter smaller than the original image. Overlay the filters in order from the upper left of the image, Calculate the sum of the image and filter values multiplied by each.

The features obtained from the image will change depending on what you do with the filter numbers, so You will learn what value the filter should have.

[Reference] You can see the CNN character recognition process visually at the following site! https://www.cs.ryerson.ca/~aharley/vis/conv/

Implementation method

There seems to be the following two patterns of implementation methods!

1. Hand-assembled with Numpy

This is a manual implementation of CNN filter arithmetic processing. It is practiced in this article. https://qiita.com/ta-ka/items/1c588dd0559d1aad9921

2. Use Keras library

You can also implement CNNs using a library dedicated to deep learning called Keras.

https://keras.io/ja/

Keras is a high-level neural network library written in Python that can be run on TensorFlow, CNTK, and Theano. If you need a deep learning library in the following cases, use Keras: -Supports both CNNs and RNNs, and combinations of these two

This time, I will try using the 2.Keras library.

Using Keras

When trying to use Keras for this purpose, there are likely to be the following options.

--Call and use standalone Keras directly --Use Keras included with TensorFlow

However, from May 2020, the official manual

"Keras comes in with TensorFlow 2.0 as tensorflow.keras. To get started with Keras, simply install TensorFlow 2.0. "

It seems that it is a flow to unify to "Keras of TensorFlow". (Quoted from the following article)

[Reference] The end of multi-backend Keras, unified into tf.keras https://www.atmarkit.co.jp/ait/articles/2005/13/news017.html

Following the passage of time, this time I will use Keras from Tensorflow.

What was used

So, I used the following. TensorFlow had to use version 2.0.0 with enhanced integration with Keras.

--Anaconda: (A package that includes Python itself and commonly used libraries)

  • TensorFlow : 2.0.0

Data used

Let's learn with the created image data (24 (characters) x 18 (sheets)) once previous.

Implementation

Import packages to use


#Package for handling arrays
import numpy as np
#Package for handling image data and files
from PIL import Image
import os, glob
# tensorflow
import tensorflow as tf
#A convenient Keras package that preprocesses data
from tensorflow.keras.preprocessing.image import array_to_img, img_to_array, load_img
from keras.utils import np_utils
#Used to separate training data and test data
from sklearn.model_selection import train_test_split
#Used for image display of training data
import matplotlib.pyplot as plt
#Used to display a summary of learning results
import pandas as pd

Read file

Prepare a set of images and labels to use for learning. This time, it seems that you have to pass the label of each data numerically to the Keras CNN model. I created a correspondence table of each rune character and label (numerical value) in Dictionary in advance.

Correspondence table of runes and labels (numerical values)
runeCharDict = { 0 : 'ᚠ',
               1 : 'ᚢ',
               2 : 'ᚦ',
               3 : 'ᚫ',
               4 : 'ᚱ',
               5 : 'ᚲ',
               6 : 'ᚷ',
               7 : 'ᚹ',
               8 : 'ᚺ',
               9 : 'ᚾ',
               10 : 'ᛁ',
               11 : 'ᛃ',
               12 : 'ᛇ',
               13 : 'ᛈ',
               14 : 'ᛉ',
               15 : 'ᛋ',
               16 : 'ᛏ',
               17 : 'ᛒ',
               18 : 'ᛖ',
               19 : 'ᛗ',
               20 : 'ᛚ',
               21 : 'ᛜ',
               22 : 'ᛞ',
               23 : 'ᛟ',
              }

Load the image.

#File reading
#Array to store image data
X = []
#Characters corresponding to image data(answer)Array to store
Y = []

#Training data directory file
dir = '[Directory where image data of handwritten characters is stored]'
files = glob.glob(dir + "\\*.png ")

#Vertical and horizontal size of the image(pixel)
image_size = 50

#Read the file in the directory and add it to the list of training data
for i, file in enumerate(files):
    
    temp_img = load_img(file, target_size=(image_size, image_size))
    temp_img_array  = img_to_array(temp_img)
    X.append(temp_img_array)
    
    moji = file.split("\\")[-1].split("_")[0]
    label = list(runeCharDict.keys())[list(runeCharDict.values()).index(moji)]
    Y.append(label)

X = np.asarray(X)
Y = np.asarray(Y)

#Convert pixel values from 0 to 1
X = X.astype('float32')
X = X / 255.0

#Convert class format
Y = np_utils.to_categorical(Y, 24)

Create a model

Create a model to train. Here, set the "convolution layer settings (input data shape, filter settings)" and "activation function to be used".

A detailed explanation of each element is described in great detail in this article. Please refer ...!

https://qiita.com/mako0715/items/b6605a77467ac439955b

#Create a model for CNN
model = tf.keras.models.Sequential([
  tf.keras.layers.Conv2D(64, (3,3), activation='relu', input_shape=(50, 50, 3)),
  tf.keras.layers.MaxPooling2D(2,2),
  tf.keras.layers.Conv2D(32, (3,3), activation='relu'),
  tf.keras.layers.MaxPooling2D(2,2),
  tf.keras.layers.Flatten(),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dense(24, activation='softmax')
])
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

Learning

The learning itself can be done simply by calling the fit () function.

#Separate training data and test data
x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=0.33, random_state=111)
#Learning
model.fit(x_train, y_train, epochs=5)

When executed, a summary of each learning will be displayed.

Epoch 1/5
246/246 [==============================] - 2s 9ms/sample - loss: 3.1595 - acc: 0.0935
Epoch 2/5
246/246 [==============================] - 2s 9ms/sample - loss: 2.8289 - acc: 0.2317
Epoch 3/5
246/246 [==============================] - 2s 8ms/sample - loss: 2.0306 - acc: 0.4593
Epoch 4/5
246/246 [==============================] - 2s 8ms/sample - loss: 1.0820 - acc: 0.7642
Epoch 5/5
246/246 [==============================] - 2s 9ms/sample - loss: 0.6330 - acc: 0.8333
By checking the elements on the right, you can see the correct answer rate of the model in learning.

--loss: The evaluation value of the loss function (the lower the value, the higher the prediction accuracy). --acc: The accuracy of the prediction.

In the first learning, the accuracy rate was about 9%, but in the fifth learning, the accuracy is 83%!

Model test / result display

Let the model predict the validation data.

#Apply to test data
predict_classes = model.predict_classes(x_test)

mg_df = pd.DataFrame({'predict': predict_classes, 'class': np.argmax(y_test, axis=1)})

#Output of the current maximum number of displayed columns
pd.get_option("display.max_columns")

#Specify the maximum number of displayed columns (50 columns are specified here)
pd.set_option('display.max_columns', 50)

# confusion matrix
pd.crosstab(mg_df['class'], mg_df['predict'])

Create a confusion matrix by taking corrective and incorrect predictions of test data.

A mixed matrix is a combination table of "actual values \ model predicted values". The number of correct data is where the rows and columns of the same number intersect.

result

image.png

The correct answer is the majority! After that, you can see that there are many wrong answers with the letter "ᛒ", which is number 17 in the result table. If you really want to increase the percentage of correct answers, you may want to review or increase the data of "ᛒ".

Inflated data

I feel that there is little learning data, so I will process the handwritten character image to increase the learning data. This time, I was able to easily rotate the character data with a preprocessing package called keras_preprocessing. That is also added to the data.

#Keras used behind the scenes_preprocessing
from keras_preprocessing.image import apply_affine_transform

#File reading
#Array to store image data
X = []
#Characters corresponding to image data(answer)Array to store
Y = []

#File reading(As mentioned above)

#Read the file in the directory and add it to the list of training data
for i, file in enumerate(files):
    
    #Register original data(As mentioned above)
    
    #Inflated data
    image  = img_to_array(temp_img)
        
    #1. 1. Rotate 10 degrees clockwise "theta"=Specify the frequency to rotate with
    image1 = apply_affine_transform(image, channel_axis=2, theta=10, fill_mode="nearest", cval=0.)
    X.append(image1)
    Y.append(label)

    #2. Rotate 10 degrees counterclockwise
    image2 = apply_affine_transform(image, channel_axis=2, theta=-10, fill_mode="nearest", cval=0.)
    X.append(image2)
    Y.append(label)
    
    # #3. 3. Rotate 20 degrees clockwise
    image3 = apply_affine_transform(image, channel_axis=2, theta=20, fill_mode="nearest", cval=0.)
    X.append(image3)
    Y.append(label)
    
    #4. Rotate 20 degrees counterclockwise
    image4 = apply_affine_transform(image, channel_axis=2, theta=-20, fill_mode="nearest", cval=0.)
    X.append(image4)
    Y.append(label)

It was so easy! !! In particular, since the margins generated by rotation are complemented, The background does not turn black. Isn't it really convenient ...?

Learning results

Let's learn again by adding the data increased by rotating the original image.

Epoch 1/5
1232/1232 [==============================] - 7s 6ms/sample - loss: 23.2898 - accuracy: 0.1144
Epoch 2/5
1232/1232 [==============================] - 7s 6ms/sample - loss: 1.1991 - accuracy: 0.6396
Epoch 3/5
1232/1232 [==============================] - 7s 5ms/sample - loss: 0.3489 - accuracy: 0.8847
Epoch 4/5
1232/1232 [==============================] - 7s 5ms/sample - loss: 0.1527 - accuracy: 0.9456
Epoch 5/5
1232/1232 [==============================] - 6s 5ms/sample - loss: 0.0839 - accuracy: 0.9740

The model can now be more accurate than when the amount of data is small! (97%)

At the end

So far, I have described the flow of using a convolutional neural network using Keras in python.

What I thought

――We haven't been able to compare the exact same data, but we found that using CNN provided higher accuracy than the previous basic neural network. ――Overall, by using the tensorflow and Keras libraries, there were many places where you could write code more clearly than last time in preprocessing and display of learning / prediction results! ――I would like to investigate and understand the implementation part again with a fluffy understanding this time.

I hope this article is helpful for you.

Finally, I'm sorry I was completely late! I'm glad I was able to participate in the summer ad-care, thank you.

Recommended Posts

I tried handwriting recognition of runes with CNN using Keras
I tried handwriting recognition of runes with scikit-learn
I tried face recognition of the laughter problem using Keras.
I tried image recognition of CIFAR-10 with Keras-Learning-
I tried face recognition using Face ++
I tried to make Kana's handwriting recognition Part 3/3 Cooperation with GUI using Tkinter
I tried refactoring the CNN model of TensorFlow using TF-Slim
I tried using GrabCut of OpenCV
I tried image recognition of "Moon and Soft-shelled Turtle" with Pytorch (using torchvision.datasets.ImageFolder which corresponds to from_from_directry of keras)
I tried face recognition with OpenCV
Python: Application of image recognition using CNN
I tried simple image recognition with Jupyter
I tried CNN fine tuning with Resnet
I tried using the trained model VGG16 of the deep learning library Keras
I tried hundreds of millions of SQLite with python
I tried to move GAN (mnist) with keras
I tried to integrate with Keras in TFv1.1
I tried Flask with Remote-Containers of VS Code
I tried using mecab with python2.7, ruby2.3, php7
I tried using the image filter of OpenCV
I tried DBM with Pylearn 2 using artificial data
I tried using a database (sqlite3) with kivy
[Python] I tried to judge the member image of the idol group using Keras
I tried using parameterized
I tried using argparse
I tried using mimesis
I tried using anytree
Image recognition with keras
I tried using aiomysql
I tried using coturn
I tried using Pipenv
I tried using matplotlib
I tried using "Anvil".
I tried using Hubot
I tried using ESPCN
I tried using openpyxl
I tried using Ipython
I tried using ngrok
I tried using face_recognition
I tried using Jupyter
I tried using PyCaret
I tried using Heapq
I tried using doctest
I tried using folium
I tried using jinja2
I tried using folium
I tried using time-window
I tried to extract features with SIFT of OpenCV
[OpenCV / Python] I tried image analysis of cells with OpenCV
I tried using the API of the salmon data project
I tried to implement Grad-CAM with keras and tensorflow
I tried using Python (3) instead of a scientific calculator
I tried to identify the language using CNN + Melspectogram
I tried "morphology conversion" of images with Python + OpenCV
I tried to find the entropy of the image with python
I tried fp-growth with python
I tried scraping with Python
I tried using the Python library from Ruby with PyCall
I tried to find the average of the sequence with TensorFlow
I tried starting Django's server with VScode instead of Pycharm
I tried Learning-to-Rank with Elasticsearch!