Hello Licht. Following here, Deep Learning Tutorial Chapter 4 I will talk about improving recognition accuracy by expanding data.
Last time (Chapter 3), loss 0.526 at epoch16 was the best score. However, this is not good because it is rarely misrecognized even in print.
However, even if learning is continued as it is, only the loss of the train will decrease and the loss of the test will continue to increase. "Overfitting" will occur and the recognition accuracy will not improve. In order to prevent overfitting and improve accuracy, let's increase the learning data.
It is ideal to increase the original learning data, but it takes time and money to collect the training data, so we will expand the data.
About types of data expansion
Thinning to eliminate the recognition dependence on the thickness of characters
Normally, the inverted image is not input, so it seems to be an adverse data enlargement at first glance, but it is effective from the viewpoint of TTA (data enlargement even during testing).
Since there are an infinite number of combinations of data expansion by using random numbers such as rotation angle and rotation axis (three-dimensional) for rotation and the number of moving pixels for movement, the above methods are combined to create a Mugen image from one image. I will make it. The number of enlarged sheets and the result are as follows.
Enlarged number | test loss best score |
---|---|
10 sheets | 0.526 |
100 sheets | 0.277 |
300 sheets | 0.260 |
500 sheets | 0.237 |
Something is light, but it feels good. Isn't the data duplicated when enlarged to 500 sheets? I think, but in the end it's OK.
By the way, Elastic distortion looks like an ideal data enlargement, but it is actually difficult to handle because it takes time to process and causes overfitting (experience story).
Even with 500 sheets, the accuracy is steadily increasing (loss is decreasing), so next I tried expanding to ** 3500 sheets **. (However, since there is a limit in terms of memory and processing time (on my PC), it is limited to only 5 cards, "A", "I", "U", "E", and "O".)
('epoch', 1)
train mean loss=0.167535232874, accuracy=0.937596153205
test mean loss=0.23016545952, accuracy=0.914285708447
('epoch', 2)
train mean loss=0.0582337708299, accuracy=0.979920332723
test mean loss=0.132406316127, accuracy=0.955102039843
('epoch', 3)
train mean loss=0.042050985039, accuracy=0.985620883214
test mean loss=0.0967423064653, accuracy=0.959183678335
('epoch', 4)
train mean loss=0.0344518882785, accuracy=0.98846154267
test mean loss=0.0579228501539, accuracy=0.983673472794
The result looks like this. Loss has dropped to 0.057 in epoch4. As mentioned in Chapter 3, I could somehow recognize the handwritten hiragana with the model of loss 0.237, so I can expect it this time. Therefore, I wrote 50 hiragana sheets at hand and tested the accuracy.
This time, the recognition result is evaluated after expanding the data to 30 sheets at the time of testing. (There is no particular reason for this "30 sheets")
$ python AIUEONN_predictor.py --model loss0057model --img ../testAIUEO/o0.png
init done
Candidate neuron number:4, Unicode:304a,Hiragana:O
.
.(Omitted)
.
Candidate neuron number:4, Unicode:304a,Hiragana:O
Candidate neuron number:4, Unicode:304a,Hiragana:O
Candidate neuron number:3, Unicode:3048,Hiragana:e
Candidate neuron number:4, Unicode:304a,Hiragana:O
**Final judgment Neuron number:4, Unicode:304a,Hiragana:O**
It's OK.
46 out of 50 correct answers. With 4 mistakes, the accuracy is 92%! By the way, only these 4 photos were missed. Oita characters are dirty for "A" (sweat;
Some of the things that worked
It is difficult to express because it is a test data set of the front miso, but it has a good accuracy. I feel the possibility of Deep Learning because it is a type-centered learning data and has applicability for handwritten characters. Chapter 4 ends here. In the next chapter 5, I would like to learn from the basics of neural networks by referring to Hi-King's blog.
chapter | title |
---|---|
Chapter 1 | Building a Deep Learning environment based on chainer |
Chapter 2 | Creating a Deep Learning Predictive Model by Machine Learning |
Chapter 3 | Character recognition using a model |
Chapter 4 | Improvement of recognition accuracy by expanding data |
Chapter 5 | Introduction to neural networks and explanation of source code |
Chapter 6 | Improvement of learning efficiency by selecting Optimizer |
Chapter 7 | TTA,Improvement of learning efficiency by Batch Normalization |
Recommended Posts