[PYTHON] 99.78% accuracy with deep learning by recognizing handwritten hiragana

With the recognition of handwritten hiragana and the convolutional neural network (deep learning) according to the textbook, the accuracy of 99.78% was obtained. It seems that there is a merit for those who read and write that it is as per the textbook (it is not unique), so I will write it in Qiita.

Source code

The source code is https://github.com/yukoba/CnnJapaneseCharacter.

History

I was chatting with a friend about handwritten hiragana recognition on Facebook, and when I googled, I found these two.

-"Since I touched Tensorflow for 2 months, I explained the convolutional neural network in an easy-to-understand manner with 95.04% of" handwritten hiragana "identification" http://qiita.com/tawago/items/931bea2ff6d56e32d693 --Stanford University Student Report "Recognizing Handwritten Japanese Characters Using Deep Convolutional Neural Networks" http://cs231n.stanford.edu/reports2016/262_Report.pdf

Both were written in March 2016, but Stanford University students are the first.

In a report from a student at Stanford University,

--Hiragana: 96.50% --Katakana: 98.19% --Kanji: 99.64%

It was.

According to another friend's analysis, I expected that kanji would be easier because it has more clues. In this article, I will talk about raising the most inaccurate hiragana to 99.78%.

data set

For the data, everyone uses ETL8G from AIST's "ETL character database", and I also use it. If you would like to see specific handwritten characters, please visit http://etlcdb.db.aist.go.jp/?page_id=2461.

The data is 128x127 px, a 4-bit grayscale image for 160 people. There are 72 types of hiragana.

Convolutional neural network

See tawago's http://qiita.com/tawago/items/931bea2ff6d56e32d693 for the basic story of what a convolutional neural network (deep learning) is. Also, O'Reilly Japan's book "Deep Learning from scratch-The theory and implementation of deep learning learned with Python" is also a good introduction. It was (I'm sorry I'm only browsing).

Library

As a neural network library, this time

--High layer: Keras https://keras.io/ja/ --Low layer: Theano http://deeplearning.net/software/theano/

I used. I wrote the code to work with TensorFlow as a low layer.

The programming language is Python 3.

Improvement points

So, what has changed from 95.04% of tawago and 96.50% of Stanford University students. I'm only doing the basics.

Number of iterations

First of all, a student at Stanford University, it seems to be done with CPU, and the number of calculations was insufficient, so I used the GPU of Amazon EC2. Increased the number of epochs (repetition count) from 40 to 400.

Machine learning is divided into training data and evaluation data. I train the training data, but the stochastic gradient descent method uses random numbers, and in principle, it increases or decreases finely and changes to rattling, but as a different story, When the learning result is applied to the evaluation data, it often improves to a certain point and worsens from a certain point. This is called overfitting.

Ideally, the number of repetitions (number of epochs) should be done until the overfitting starts (early end), and this time it seems that overfitting starts at about 300 to 400 times (I have not confirmed it seriously), so 400 times. It is set to.

The number of divisions between training data and evaluation data is 8: 2. This followed a student at Stanford University.

model

The model used what is commonly called "VGG style". It was announced by a person at Oxford University in September 2014 as Very Deep Convolutional Networks for Large-Scale Image Recognition. A Keras sample written based on this is https://github.com/fchollet/keras/blob/master/examples/cifar10_cnn.py, and I am modifying it. Stanford University students are also VGG-style. tawago is unknown. Is it VGG style?

The VGG style is a normal neural network that repeats "convolution-> convolution-> Max pooling".

Here is a brief explanation of convolution. For details, see the book "Deep Learning from scratch."

--Convolution: Take the neighborhood of each point (3x3, etc.), convert it to a one-dimensional vector, and inner product with the parameter. --Max pooling: Take the neighborhood of each point (2x2, etc.) and the maximum value in it

ETL8G has only data for 160 people, not a large dataset. Generally, when there is little data, using a complex model with a large number of parameters does not work, so I used the simple Keras sample as it is. "Convolution-> Convolution-> Max pooling" is twice.

Noise function

One way to improve generalization ability is to add noise only during training. These are the two I used this time.

--Dropout (1 / p times with probability p, 0 times with probability 1-p to erase) --Rotate to teacher image (± 15 degrees), zoom (0.8 to 1.2 times)

Dropout is also used by Stanford University students. I have all p = 0.5, that is, 50% chance of doubling and 50% chance of 0.

I used it in the sample code of Keras, and I also used it to learn that the input image can be rotated / zoomed as characters even if it is rotated / zoomed. This is also very effective. Stanford University students didn't use it.

The rest of the details

--The image has been reduced to 32x32. This is enough, and if it is large, the amount of calculation will increase. Stanford University students set it to 64x64. Also, in terms of the balance of the number of convolutions, having a large number of pixels does not mean that it will lead to improvement. --By default, Keras has a normal distribution with a standard deviation of 0.1 because the initial value is strange and learning does not proceed. ――The stochastic gradient descent method doesn't work well for a bouncer like Adam, so I used a gentle AdaGrad at first, but AdaDelta, a variant of AdaGrad that does not require a learning rate, was better. So I used that.

Conclusion

So, I did what the textbook did, and it was 99.78%. Handwritten numbers are reported to be 99.77% in the MNIST dataset, which is about the same as http://yann.lecun.com/exdb/mnist/. I don't do kanji or anything else, but Stanford University students say 99.64%, which is a little better than this.

Deep learning and handwritten characters can be recognized almost perfectly!

Recommended Posts

99.78% accuracy with deep learning by recognizing handwritten hiragana
Classify anime faces by sequel / deep learning with Keras
Try deep learning with TensorFlow
Deep Kernel Learning with Pyro
Try Deep Learning with FPGA
Generate Pokemon with Deep Learning
Try Deep Learning with FPGA-Select Cucumbers
Cat breed identification with deep learning
Make ASCII art with deep learning
Try deep learning with TensorFlow Part 2
Solve three-dimensional PDEs with deep learning.
Deep learning learned by implementation 1 (regression)
Check squat forms with deep learning
Categorize news articles with deep learning
Forecasting Snack Sales with Deep Learning
Make people smile with Deep Learning
Machine Learning with docker (42) Programming PyTorch for Deep Learning By Ian Pointer
Deep learning learned by implementation 2 (image classification)
Classify anime faces with deep learning with Chainer
Try Bitcoin Price Forecasting with Deep Learning
Try with Chainer Deep Q Learning --Launch
Produce beautiful sea slugs by deep learning
Try deep learning of genomics with Kipoi
Deep Understanding Object Detection by Deep Learning by Keras
Sentiment analysis of tweets with deep learning
Deep Learning
The story of doing deep learning with TPU
Chainer and deep learning learned by function approximation
Deep learning learned by implementation ~ Anomaly detection (unsupervised learning) ~
Video frame interpolation by deep learning Part1 [Python]
Parallel learning of deep learning by Keras and Kubernetes
Deep learning learned by implementation (segmentation) ~ Implementation of SegNet ~
Deep Learning Memorandum
Start Deep learning
Python Deep Learning
Deep learning × Python
Build a "bot that tells you AV actresses with similar faces" by deep learning
Stock investment by deep reinforcement learning (policy gradient method) (1)
Classify mnist numbers by unsupervised learning with keras [Autoencoder]
Deep learning image analysis starting with Kaggle and Keras
[Anomaly detection] Detect image distortion by deep distance learning
Classify articles with tags specified by Qiita by unsupervised learning
Extract music features with Deep Learning and predict tags
(Deep learning) Images were collected from the Flickr API and discriminated by transfer learning with VGG16.
I tried to make Othello AI that I learned 7.2 million hands by deep learning with Chainer