Hello Licht. Following here, Deep Learning Tutorial Chapter 3 Describes character recognition using a model.
We will use model 16 generated by the machine learning of Deep Learning in Chapter 2. First, let's prepare an image of hiragana. Actually, fonts that are not in the data used for learning are good, but it is difficult to find them, so prepare them in a textbook.
If you have trouble finding it, please download and use the image below. (I wrote "A" in OneNote and cut it out as an image)
Place the image "A" (a.png) in the same directory as model16,
Enter the following command from the terminal
python hiraganaNN_predictor.py --img a.png --model model16
Then in the output
Candidate neuron number:1, Unicode:3042,Hiragana:Ah
Candidate neuron number:1, Unicode:3042,Hiragana:Ah
Candidate neuron number:1, Unicode:3042,Hiragana:Ah
Candidate neuron number:1, Unicode:3042,Hiragana:Ah
Candidate neuron number:1, Unicode:3042,Hiragana:Ah
Candidate neuron number:1, Unicode:3042,Hiragana:Ah
Candidate neuron number:1, Unicode:3042,Hiragana:Ah
Candidate neuron number:1, Unicode:3042,Hiragana:Ah
Candidate neuron number:1, Unicode:3042,Hiragana:Ah
Candidate neuron number:1, Unicode:3042,Hiragana:Ah
Candidate neuron number:10, Unicode:304b,Hiragana:Or
Candidate neuron number:1, Unicode:3042,Hiragana:Ah
Candidate neuron number:78, Unicode:308f,Hiragana:Wow
Candidate neuron number:1, Unicode:3042,Hiragana:Ah
**Final judgment Neuron number:1, Unicode:3042,Hiragana:Ah**
The prediction result came out. Although there are "ka" and "wa" among the candidates, the output of the final judgment is "a", so recognition is successful. This candidate is the recognition result of each image when it is enlarged to multiple images (14 images). The final judgment is the average of each recognition result.
TTA (Test Time Argumentation) that this is not judged from one image, but the recognition accuracy is improved by judging from various angles. It is the same concept as the method. Try other perceptions as well.
a2.png recognition
python hiraganaNN_predictor.py --img a2.png --model model16
result
**Final judgment Neuron number:1, Unicode:3042,Hiragana:Ah**
a3.png recognition
python hiraganaNN_predictor.py --img a2.png --model model16
result
**Final judgment Neuron number:1, Unicode:3042,Hiragana:Ah**
It's OK.
Let's try a little more difficult recognition because it is the correct answer. Let's recognize the handwritten "A".
To be honest, it's unreasonable, but let's just do it.
python hiraganaNN_predictor.py --img a_tegaki.png --model model16
Start recognition with
Candidate neuron number:71, Unicode:3088,Hiragana:Yo
Candidate neuron number:30, Unicode:305f,Hiragana:Ta
Candidate neuron number:71, Unicode:3088,Hiragana:Yo
Candidate neuron number:1, Unicode:3042,Hiragana:Ah
Candidate neuron number:24, Unicode:3059,Hiragana:Su
Candidate neuron number:71, Unicode:3088,Hiragana:Yo
Candidate neuron number:30, Unicode:305f,Hiragana:Ta
Candidate neuron number:24, Unicode:3059,Hiragana:Su
Candidate neuron number:32, Unicode:3061,Hiragana:Chi
Candidate neuron number:32, Unicode:3061,Hiragana:Chi
Candidate neuron number:24, Unicode:3059,Hiragana:Su
Candidate neuron number:30, Unicode:305f,Hiragana:Ta
Candidate neuron number:1, Unicode:3042,Hiragana:Ah
Candidate neuron number:1, Unicode:3042,Hiragana:Ah
**Final judgment Neuron number:32, Unicode:3061,Hiragana:Chi**
It's a failure! Lol There is a little "A" as a candidate, but it was not good because it was the final decision "Chi". Well, model 16 has a loss of 0.526, and I learned it in print, so it's like this. However, if you try with a model with loss reduced to 0.237 by applying some improvement measures in the following chapters
python hiraganaNN_predictor.py --img a2.png --model loss237model
Candidate neuron number:30, Unicode:305f,Hiragana:Ta
Candidate neuron number:52, Unicode:3075,Hiragana:Fu
Candidate neuron number:52, Unicode:3075,Hiragana:Fu
Candidate neuron number:1, Unicode:3042,Hiragana:Ah
Candidate neuron number:1, Unicode:3042,Hiragana:Ah
Candidate neuron number:32, Unicode:3061,Hiragana:Chi
Candidate neuron number:71, Unicode:3088,Hiragana:Yo
Candidate neuron number:52, Unicode:3075,Hiragana:Fu
Candidate neuron number:9, Unicode:304a,Hiragana:O
Candidate neuron number:52, Unicode:3075,Hiragana:Fu
Candidate neuron number:9, Unicode:304a,Hiragana:O
Candidate neuron number:1, Unicode:3042,Hiragana:Ah
Candidate neuron number:9, Unicode:304a,Hiragana:O
Candidate neuron number:10, Unicode:304b,Hiragana:Or
**Final judgment Neuron number:1, Unicode:3042,Hiragana:Ah**
I can recognize it correctly (although it is barely possible)! Basically, we have used print data as learning data for machine learning, but it is a model that can recognize handwriting a little. Since the candidates recognize it by mistake, we can see the usefulness of TTA.
By the way There are 3 "A" and 4 "Fu" as candidates, so why is the final decision "A"? If you think, it's awesome! However, we are not making a simple majority vote here. We are considering the results of making more confident decisions. In other words, while all four "fu" were predicted by Ayafuya, three "A" were confident, so the final judgment was "A".
Finally, an overview of the source code. Mostly the same as hiraganaNN.py used for machine learning.
def forward(x_data, train=False):
x = chainer.Variable(x_data, volatile=not train)
h = F.max_pooling_2d(F.relu(model.bn1(model.conv1(x))), 2)
h = F.max_pooling_2d(F.relu(model.bn2(model.conv2(h))), 2)
h = F.max_pooling_2d(F.relu(model.conv3(h)), 2)
h = F.dropout(F.relu(model.fl4(h)), train=train)
y = model.fl5(h)
return y.data
This is a deep learning neural network structure and must have the same structure as the forward of hiraganaNN.py.
src = cv2.imread(args.img, 0)
src = cv2.copyMakeBorder(
src, 20, 20, 20, 20, cv2.BORDER_CONSTANT, value=255)
src = cv2.resize(src, (IMGSIZE, IMGSIZE))
Read input image and resize to 64 * 64 image size. I also add a margin of 20 pixels to the image. With the current model, it cannot be recognized well unless the margin is an appropriate amount.
for x in xrange(0, 14):
dst = dargs.argumentation([2, 3])
ret, dst = cv2.threshold(dst,
23,
255,
cv2.THRESH_BINARY)
#For image confirmation
#cv2.imshow('ARGUMENTATED', dst)
#cv2.waitKey(0)
#cv2.destroyAllWindows()
xtest = np.array(dst).astype(np.float32).reshape(
(1, 1, IMGSIZE, IMGSIZE)) / 255
if result is None:
result = forward(xtest)
else:
result = result + forward(xtest)
After enlarging the data of the input image and binarizing it, the pixel value is normalized to 0-1 and predicted by the forward function.
tmp = np.argmax(forward(xtest))
for strunicode, number in unicode2number.iteritems():
if number == tmp:
hiragana = unichr(int(strunicode, 16))
print 'Candidate neuron number:{0}, Unicode:{1},Hiragana:{2}'.format(number, strunicode, hiragana.encode('utf_8'))
The recognition result of the neural network (number 0-82) is converted to unicode and output as hiragana. Chapter 3 ends here. In Chapter 4, we will enlarge one image to 3500 to see the accuracy.
chapter | title |
---|---|
Chapter 1 | Building a Deep Learning environment based on chainer |
Chapter 2 | Creating a Deep Learning Predictive Model by Machine Learning |
Chapter 3 | Character recognition using a model |
Chapter 4 | Improvement of recognition accuracy by expanding data |
Chapter 5 | Introduction to neural networks and explanation of source code |
Chapter 6 | Improvement of learning efficiency by selecting Optimizer |
Chapter 7 | TTA,Improvement of learning efficiency by Batch Normalization |