Hello Licht. Following here, Deep Learning Tutorial Chapter 4 I will talk about improving recognition accuracy by expanding data.

Improved recognition accuracy

Last time (Chapter 3), loss 0.526 at epoch16 was the best score. However, this is not good because it is rarely misrecognized even in print.

However, even if learning is continued as it is, only the loss of the train will decrease and the loss of the test will continue to increase. "Overfitting" will occur and the recognition accuracy will not improve. In order to prevent overfitting and improve accuracy, let's increase the learning data.

It is ideal to increase the original learning data, but it takes time and money to collect the training data, so we will expand the data.

Data expansion

About types of data expansion

1. Rotate, move, scale, binarize

Elastic Distortion Data expansion by giving artificial distortion.

### 3. Noise Imprus noise, Gaussian noise, etc. ![impulse.png](https://qiita-image-store.s3.amazonaws.com/0/110732/fd8ea417-d928-ea39-253d-2ff42af9e3a1.png)

4. Thinning

Thinning to eliminate the recognition dependence on the thickness of characters

5. Invert

Normally, the inverted image is not input, so it seems to be an adverse data enlargement at first glance, but it is effective from the viewpoint of TTA (data enlargement even during testing).

Practice results

Since there are an infinite number of combinations of data expansion by using random numbers such as rotation angle and rotation axis (three-dimensional) for rotation and the number of moving pixels for movement, the above methods are combined to create a Mugen image from one image. I will make it. The number of enlarged sheets and the result are as follows.

Enlarged number	test loss best score
10 sheets	0.526
100 sheets	0.277
300 sheets	0.260
500 sheets	0.237

Something is light, but it feels good. Isn't the data duplicated when enlarged to 500 sheets? I think, but in the end it's OK.

By the way, Elastic distortion looks like an ideal data enlargement, but it is actually difficult to handle because it takes time to process and causes overfitting (experience story).

Enlarge and test

Even with 500 sheets, the accuracy is steadily increasing (loss is decreasing), so next I tried expanding to ** 3500 sheets **. (However, since there is a limit in terms of memory and processing time (on my PC), it is limited to only 5 cards, "A", "I", "U", "E", and "O".)

('epoch', 1)
train mean loss=0.167535232874, accuracy=0.937596153205
test mean loss=0.23016545952, accuracy=0.914285708447
('epoch', 2)
train mean loss=0.0582337708299, accuracy=0.979920332723
test mean loss=0.132406316127, accuracy=0.955102039843
('epoch', 3)
train mean loss=0.042050985039, accuracy=0.985620883214
test mean loss=0.0967423064653, accuracy=0.959183678335
('epoch', 4)
train mean loss=0.0344518882785, accuracy=0.98846154267
test mean loss=0.0579228501539, accuracy=0.983673472794

The result looks like this. Loss has dropped to 0.057 in epoch4. As mentioned in Chapter 3, I could somehow recognize the handwritten hiragana with the model of loss 0.237, so I can expect it this time. Therefore, I wrote 50 hiragana sheets at hand and tested the accuracy.

This time, the recognition result is evaluated after expanding the data to 30 sheets at the time of testing. (There is no particular reason for this "30 sheets")

$ python AIUEONN_predictor.py --model loss0057model --img ../testAIUEO/o0.png 
init done 
Candidate neuron number:4, Unicode:304a,Hiragana:O
.
.(Omitted)
.
Candidate neuron number:4, Unicode:304a,Hiragana:O
Candidate neuron number:4, Unicode:304a,Hiragana:O
Candidate neuron number:3, Unicode:3048,Hiragana:e
Candidate neuron number:4, Unicode:304a,Hiragana:O
**Final judgment Neuron number:4, Unicode:304a,Hiragana:O**

It's OK.

result

46 out of 50 correct answers. With 4 mistakes, the accuracy is 92%! By the way, only these 4 photos were missed. Oita characters are dirty for "A" (sweat;

Some of the things that worked

It is difficult to express because it is a test data set of the front miso, but it has a good accuracy. I feel the possibility of Deep Learning because it is a type-centered learning data and has applicability for handwritten characters. Chapter 4 ends here. In the next chapter 5, I would like to learn from the basics of neural networks by referring to Hi-King's blog.

chapter	title
Chapter 1	Building a Deep Learning environment based on chainer
Chapter 2	Creating a Deep Learning Predictive Model by Machine Learning
Chapter 3	Character recognition using a model
Chapter 4	Improvement of recognition accuracy by expanding data
Chapter 5	Introduction to neural networks and explanation of source code
Chapter 6	Improvement of learning efficiency by selecting Optimizer
Chapter 7	TTA,Improvement of learning efficiency by Batch Normalization

[PYTHON] Introduction to Deep Learning for the first time (Chainer) Japanese character recognition Chapter 4 [Improvement of recognition accuracy by expanding data]