[PYTHON] I tried handwriting recognition of runes with scikit-learn

This article is the 15th day article of Fujitsu Systems Web Technology Advent Calendar. (Promise) The content of this article is my own opinion and does not represent the organization to which I belong.

Introduction

In this article, we use ** Python's machine learning library, scikit-learn. This is a summary of the procedure and results when I tried handwriting recognition of runes **. While the ancestors of this Advent Calendar have posted great features and know-how that seem to be quite useful in actual work I just make something that is just fun for me, and I can't help it. .. I hope you can take a quick look.

Also, since the author of this article is a beginner in machine learning, there may be more content now. Also for those who are going to touch python, scikit-learn I will do my best to provide reasonable information. Thank you.

background

I don't touch it at all in my daily work, but I'm interested in machine learning and want to learn the basics of learning and movement. I touched scikit-learn. I could have tried using the dataset that the library has, but I want to know "what kind of data can I prepare and what can I do?" I decided to start from the point of preparing the data.

** (digression) ** Runes are kind of cool, aren't they?

What was used

--Anaconda: A package that includes Python itself and commonly used libraries. --scikit-learn: A library that makes it easy to use neural networks in Python. It is open source. This time, we will use a model called MLP Classifier that performs "classification".

--E-cutter: Free software for image division. Used to divide the handwritten character image.

Data preparation

This time, we will focus on "Germanic common runes (24 characters)" among the runes. images.png

There seems to be no convenient data such as "rune character data for machine learning", so prepare your own image. This time, I created it by the following method.

(1) Create an image with handwritten characters lined up at regular intervals by handwriting. (It will be convenient later if the image file name is drawn with one character (example: "ᚠ.png ")) (2) Divide the image into equal parts with E-cutter (free software). 01_画像取得Ecutter.PNG The divided image is saved in the specified folder with "[original file name] _ [branch number] .png ", which was very convenient. Once, 18 image data were created for each type of rune character.

Loading images

From here, we will process in Python. First, load the image.

import cv2 #Library for image conversion
import matplotlib.pyplot as plt
import numpy as np
from PIL import Image
import os, glob

#Array to store image data
X = []
#Characters corresponding to image data(answer)Array to store
Y = []

#Training data directory file
dir = "[Image data storage directory]"
files = glob.glob(dir + "\\*.png ")

#Vertical and horizontal size of the image(pixel)
image_size = 50

#Read the file in the directory and add it to the list of training data
for i, file in enumerate(files):
    image = Image.open(file)
    #Convert to 8bit grayscale
    image = image.convert("L")
    image = image.resize((image_size, image_size))
    data = np.asarray(image).flatten()
    X.append(data)
    moji = file.split("\\")[-1].split("_")[0]
    Y.append(moji)
 
X = np.array(X)
Y = np.array(Y)

Let's display the loaded image.

#Visualize the first data
showimage = np.reshape(X[0], (50,50)) #Make a 50x50 double array with reshape
plt.subplot(121),plt.imshow(showimage),plt.title('Input')
plt.show()
02_画像読み込み.PNG

It can be read as data of 50 * 50 size.

◆ Let's learn and classify

The number of data is quite small, but let's learn and classify with this data (24 (characters) * 18 (sheets)) once!

・ Separate data for learning and testing
#Divide the data for training and testing
x_train, x_test, y_train, y_test = model_selection.train_test_split(X, Y, test_size=0.1, random_state=0)
・ Learn and judge
#Learning
clf = MLPClassifier(hidden_layer_sizes=(200,))
clf.fit(x_train, y_train)

#Categorize data for testing
y_pred = clf.predict(x_test)

#View results
print("---Assumed answer---")
print(y_test)

print("---The answer given by the model---")
print(y_pred)

print("---Correct answer rate---")
accuracy_score(y_test, y_pred)

·result

12120023_最初の結果_全然ダメ.PNG

** The percentage of correct answers is very low ...! !! (7.1%) orz **

Most of them are classified as "ᚱ", and the other characters judged are also incorrect ... However, those classified other than "ᚱ" seem to be classified as runes with similar shapes. I'm a little happy to get a glimpse of the sprout of intelligence (although I made a mistake after all).

It seems that the learning data is still insufficient, but it is troublesome to add more handwritten character data. Try Data Augmentation.

Data Augmentation

Prepared [character type] * It seems that the data for learning is not enough with only 18 images, so Convert data to increase the amount of data.

The following article was very helpful for data augmentation. https://products.sint.co.jp/aisia/blog/vol1-7#toc-3

It seems that there are methods such as "increasing noise", "reversing", "shifting", and "transforming". This time, we will perform "transformation" and "rotation" in it.

Deformation

#Transform the image
for i, file in enumerate(files):
    image = Image.open(file)   
    image = image.resize((image_size, image_size))
    image = image.convert("L")
    moji = file.split("\\")[-1].split("_")[0]
    
    #Invert the bits of the data into an array
    image_array = cv2.bitwise_not(np.array(image))
    
    ##Transformation ①
    #Create transformation map of image
    pts1 = np.float32([[0,0],[0,100],[100,100],[100,0]])
    pts2 = np.float32([[0,0],[0, 98],[102,102],[100,0]])
    #Image transformation
    M = cv2.getPerspectiveTransform(pts1,pts2)
    dst1 = cv2.warpPerspective(image_array,M,(50, 50))
    
    X.append(dst1.flatten())
    Y.append(moji)
    
    ##Deformation ②
    #Create transformation map of image
    pts2 = np.float32([[0,0],[0, 102],[98, 98],[100,0]]) 
    #Image transformation
    M = cv2.getPerspectiveTransform(pts1,pts2)
    dst2 = cv2.warpPerspective(image_array,M,(50, 50))
    X.append(dst2.flatten())
    Y.append(moji)

#Display the last data
showimage = np.reshape(image_array, (50,50)) #Make a 50x50 double array with reshape
plt.subplot(121),plt.imshow(showimage),plt.title('Input')
plt.subplot(122),plt.imshow(dst),plt.title('Output')
plt.show()
Display of deformed image
03_画像変形.PNG

I was able to generate an image that was slightly deformed compared to the original image. Two deformed images are generated for each original image and added to the training data.

rotation

To further increase the data, create an image with the original image rotated 15 degrees and add it.

#Rotate the image
for i, file in enumerate(files):
    image = Image.open(file)   
    image = image.resize((image_size, image_size))
    image = image.convert("L")
    moji = file.split("\\")[-1].split("_")[0]
    
    #Invert the bits of the data into an array
    image = cv2.bitwise_not(np.array(image))
    
    #1. 1. Rotate 15 degrees clockwise
    #Specify the rotation angle
    angle = -15.0
    #Specify scale
    scale = 1.0
    #use getRotationMatrix2D function
    trans = cv2.getRotationMatrix2D((24, 24), angle , scale)
    #Affine transformation
    image1 = cv2.warpAffine(image, trans, (50, 50))
    X.append(image1.flatten())
    Y.append(moji)
    
    #2. Rotate 15 degrees counterclockwise
    #Specify the rotation angle
    angle = 15.0
    #Specify scale
    scale = 1.0
    #use getRotationMatrix2D function(Arguments: center position, rotation angle, scale)
    trans = cv2.getRotationMatrix2D((24, 24), angle , scale)
    #Affine transformation
    image2 = cv2.warpAffine(image, trans, (50, 50))
    X.append(image2.flatten())
    Y.append(moji)

#Display the last data
showimage = np.reshape(image, (50,50)) #Make a 50x50 double array with reshape
plt.subplot(121),plt.imshow(showimage),plt.title('Input')
plt.subplot(122),plt.imshow(image1),plt.title('Output')
plt.show()

showimage = np.reshape(image, (50,50)) #Make a 50x50 double array with reshape
plt.subplot(121),plt.imshow(showimage),plt.title('Input')
plt.subplot(122),plt.imshow(image2),plt.title('Output')
plt.show()
Display of rotated image
03_画像回転.PNG

The original image was rotated 15 degrees to the left and right to generate each image. This image is also added to the training data.

With this, the number of data for learning is 5 times the original (original, transformation ①, transformation ②, clockwise rotation, counterclockwise rotation).

Learning / recognition [re]

Let's try to recognize (classify) images for learning and testing again!

·result

12152147_次の結果_うまくいった!.PNG

** Accuracy has improved ...! (86.9%) **

Looking back

Which data was valid after all

The result was something like "I increased the data and the accuracy of the analysis improved! I did it!" After all, I was wondering how effective each of the deformed images was, so Roughly speaking, I changed the breakdown of the data to be trained and tried to verify the correct answer rate. 雑まとめ.PNG I was able to confirm that the correct answer rate was high enough to give variation to the image.

What I want to do in the future

I would like to continue to verify whether other methods ("trimming", "noise" ...) will further improve the accuracy. In addition, this time I will verify the pattern in which the number of nodes in the hidden layer of the neural network was changed, which was fixed at 200.

Thank you for reading!

Recommended Posts

I tried handwriting recognition of runes with scikit-learn
I tried handwriting recognition of runes with CNN using Keras
I tried image recognition of CIFAR-10 with Keras-Learning-
I tried image recognition of CIFAR-10 with Keras-Image recognition-
I tried face recognition with OpenCV
I tried simple image recognition with Jupyter
I tried hundreds of millions of SQLite with python
I tried Flask with Remote-Containers of VS Code
I tried to make Kana's handwriting recognition Part 3/3 Cooperation with GUI using Tkinter
I tried to extract features with SIFT of OpenCV
[OpenCV / Python] I tried image analysis of cells with OpenCV
I tried "morphology conversion" of images with Python + OpenCV
I tried fp-growth with python
I tried scraping with Python
I tried Learning-to-Rank with Elasticsearch!
I tried face recognition using Face ++
I tried clustering with PyCaret
Parallel processing with Parallel of scikit-learn
I tried gRPC with Python
I tried scraping with python
I tried to find the entropy of the image with python
I tried "gamma correction" of the image with Python + OpenCV
I tried to find the average of the sequence with TensorFlow
I tried starting Django's server with VScode instead of Pycharm
I tried running Movidius NCS with python of Raspberry Pi3
I tried face recognition of the laughter problem using Keras.
I tried to implement ListNet of rank learning with Chainer
I tried a stochastic simulation of a bingo game with Python
Grid search of hyperparameters with Scikit-learn
I tried trimming efficiently with OpenCV
I tried summarizing sentences with summpy
I tried machine learning with liblinear
I tried web scraping with python.
I tried moving food with SinGAN
I tried using GrabCut of OpenCV
I tried implementing DeepPose with PyTorch
I tried face detection with MTCNN
I tried running prolog with python 3.8.2.
I tried SMTP communication with Python
I tried sentence generation with GPT-2
I tried learning LightGBM with Yellowbrick
I tried scraping the ranking of Qiita Advent Calendar with Python
I tried standalone deployment of play with fabric [AWS operation with boto] [Play deployment]
I tried to automate the watering of the planter with Raspberry Pi
I tried cross-validation based on the grid search results with scikit-learn
I tried to make Kana's handwriting recognition Part 1/3 First from MNIST
I tried to create a list of prime numbers with python
I tried to fix "I tried stochastic simulation of bingo game with Python"
I tried to expand the size of the logical volume with LVM
I tried running the DNN part of OpenPose with Chainer CPU
I tried to improve the efficiency of daily work with Python
I tried to automatically collect images of Kanna Hashimoto with Python! !!
I tried to make a mechanism of exclusive control with Go
I tried to verify the speaker identification by the Speaker Recognition API of Azure Cognitive Services with Python. # 1
I tried to verify the speaker identification by the Speaker Recognition API of Azure Cognitive Services with Python. # 2
I tried multiple regression analysis with polynomial regression
I tried sending an SMS with Twilio
I tried using Amazon SQS with django-celery
I tried the asynchronous server of Django 3.0
I tried to implement Autoencoder with TensorFlow
I tried linebot with flask (anaconda) + heroku