Hello Licht. Following here, Deep Learning Tutorial Chapter 3 Describes character recognition using a model.

Preparation

We will use model 16 generated by the machine learning of Deep Learning in Chapter 2. First, let's prepare an image of hiragana. Actually, fonts that are not in the data used for learning are good, but it is difficult to find them, so prepare them in a textbook.

If you have trouble finding it, please download and use the image below. (I wrote "A" in OneNote and cut it out as an image)

recognition

Place the image "A" (a.png) in the same directory as model16,

Enter the following command from the terminal

python hiraganaNN_predictor.py --img a.png --model model16

Then in the output

Candidate neuron number:1, Unicode:3042,Hiragana:Ah
Candidate neuron number:1, Unicode:3042,Hiragana:Ah
Candidate neuron number:1, Unicode:3042,Hiragana:Ah
Candidate neuron number:1, Unicode:3042,Hiragana:Ah
Candidate neuron number:1, Unicode:3042,Hiragana:Ah
Candidate neuron number:1, Unicode:3042,Hiragana:Ah
Candidate neuron number:1, Unicode:3042,Hiragana:Ah
Candidate neuron number:1, Unicode:3042,Hiragana:Ah
Candidate neuron number:1, Unicode:3042,Hiragana:Ah
Candidate neuron number:1, Unicode:3042,Hiragana:Ah
Candidate neuron number:10, Unicode:304b,Hiragana:Or
Candidate neuron number:1, Unicode:3042,Hiragana:Ah
Candidate neuron number:78, Unicode:308f,Hiragana:Wow
Candidate neuron number:1, Unicode:3042,Hiragana:Ah
**Final judgment Neuron number:1, Unicode:3042,Hiragana:Ah**

The prediction result came out. Although there are "ka" and "wa" among the candidates, the output of the final judgment is "a", so recognition is successful. This candidate is the recognition result of each image when it is enlarged to multiple images (14 images). The final judgment is the average of each recognition result.

TTA (Test Time Argumentation) that this is not judged from one image, but the recognition accuracy is improved by judging from various angles. It is the same concept as the method. Try other perceptions as well.

a2.png recognition

python hiraganaNN_predictor.py --img a2.png --model model16

result

**Final judgment Neuron number:1, Unicode:3042,Hiragana:Ah**

a3.png recognition

python hiraganaNN_predictor.py --img a2.png --model model16

result

**Final judgment Neuron number:1, Unicode:3042,Hiragana:Ah**

It's OK.

Handwritten hiragana character recognition

Let's try a little more difficult recognition because it is the correct answer. Let's recognize the handwritten "A".

To be honest, it's unreasonable, but let's just do it.

python hiraganaNN_predictor.py --img a_tegaki.png --model model16

Start recognition with

Candidate neuron number:71, Unicode:3088,Hiragana:Yo
Candidate neuron number:30, Unicode:305f,Hiragana:Ta
Candidate neuron number:71, Unicode:3088,Hiragana:Yo
Candidate neuron number:1, Unicode:3042,Hiragana:Ah
Candidate neuron number:24, Unicode:3059,Hiragana:Su
Candidate neuron number:71, Unicode:3088,Hiragana:Yo
Candidate neuron number:30, Unicode:305f,Hiragana:Ta
Candidate neuron number:24, Unicode:3059,Hiragana:Su
Candidate neuron number:32, Unicode:3061,Hiragana:Chi
Candidate neuron number:32, Unicode:3061,Hiragana:Chi
Candidate neuron number:24, Unicode:3059,Hiragana:Su
Candidate neuron number:30, Unicode:305f,Hiragana:Ta
Candidate neuron number:1, Unicode:3042,Hiragana:Ah
Candidate neuron number:1, Unicode:3042,Hiragana:Ah
**Final judgment Neuron number:32, Unicode:3061,Hiragana:Chi**

It's a failure! Lol There is a little "A" as a candidate, but it was not good because it was the final decision "Chi". Well, model 16 has a loss of 0.526, and I learned it in print, so it's like this. However, if you try with a model with loss reduced to 0.237 by applying some improvement measures in the following chapters

python hiraganaNN_predictor.py --img a2.png --model loss237model

Candidate neuron number:30, Unicode:305f,Hiragana:Ta
Candidate neuron number:52, Unicode:3075,Hiragana:Fu
Candidate neuron number:52, Unicode:3075,Hiragana:Fu
Candidate neuron number:1, Unicode:3042,Hiragana:Ah
Candidate neuron number:1, Unicode:3042,Hiragana:Ah
Candidate neuron number:32, Unicode:3061,Hiragana:Chi
Candidate neuron number:71, Unicode:3088,Hiragana:Yo
Candidate neuron number:52, Unicode:3075,Hiragana:Fu
Candidate neuron number:9, Unicode:304a,Hiragana:O
Candidate neuron number:52, Unicode:3075,Hiragana:Fu
Candidate neuron number:9, Unicode:304a,Hiragana:O
Candidate neuron number:1, Unicode:3042,Hiragana:Ah
Candidate neuron number:9, Unicode:304a,Hiragana:O
Candidate neuron number:10, Unicode:304b,Hiragana:Or
**Final judgment Neuron number:1, Unicode:3042,Hiragana:Ah**

I can recognize it correctly (although it is barely possible)! Basically, we have used print data as learning data for machine learning, but it is a model that can recognize handwriting a little. Since the candidates recognize it by mistake, we can see the usefulness of TTA.

By the way There are 3 "A" and 4 "Fu" as candidates, so why is the final decision "A"? If you think, it's awesome! However, we are not making a simple majority vote here. We are considering the results of making more confident decisions. In other words, while all four "fu" were predicted by Ayafuya, three "A" were confident, so the final judgment was "A".

Source code overview

Finally, an overview of the source code. Mostly the same as hiraganaNN.py used for machine learning.

def forward(x_data, train=False):
    x = chainer.Variable(x_data, volatile=not train)
    h = F.max_pooling_2d(F.relu(model.bn1(model.conv1(x))), 2)
    h = F.max_pooling_2d(F.relu(model.bn2(model.conv2(h))), 2)
    h = F.max_pooling_2d(F.relu(model.conv3(h)), 2)
    h = F.dropout(F.relu(model.fl4(h)), train=train)
    y = model.fl5(h)
    return y.data

This is a deep learning neural network structure and must have the same structure as the forward of hiraganaNN.py.

src = cv2.imread(args.img, 0)
src = cv2.copyMakeBorder(
    src, 20, 20, 20, 20, cv2.BORDER_CONSTANT, value=255)
src = cv2.resize(src, (IMGSIZE, IMGSIZE))

Read input image and resize to 64 * 64 image size. I also add a margin of 20 pixels to the image. With the current model, it cannot be recognized well unless the margin is an appropriate amount.

for x in xrange(0, 14):
    dst = dargs.argumentation([2, 3])
    ret, dst = cv2.threshold(dst,
                             23,
                             255,
                             cv2.THRESH_BINARY)
    #For image confirmation
    #cv2.imshow('ARGUMENTATED', dst)
    #cv2.waitKey(0)
    #cv2.destroyAllWindows()

    xtest = np.array(dst).astype(np.float32).reshape(
        (1, 1, IMGSIZE, IMGSIZE)) / 255
    if result is None:
        result = forward(xtest)
    else:
        result = result + forward(xtest)

After enlarging the data of the input image and binarizing it, the pixel value is normalized to 0-1 and predicted by the forward function.

tmp = np.argmax(forward(xtest))
for strunicode, number in unicode2number.iteritems():
    if number == tmp:
        hiragana = unichr(int(strunicode, 16))
        print 'Candidate neuron number:{0}, Unicode:{1},Hiragana:{2}'.format(number, strunicode, hiragana.encode('utf_8'))

The recognition result of the neural network (number 0-82) is converted to unicode and output as hiragana. Chapter 3 ends here. In Chapter 4, we will enlarge one image to 3500 to see the accuracy.

chapter	title
Chapter 1	Building a Deep Learning environment based on chainer
Chapter 2	Creating a Deep Learning Predictive Model by Machine Learning
Chapter 3	Character recognition using a model
Chapter 4	Improvement of recognition accuracy by expanding data
Chapter 5	Introduction to neural networks and explanation of source code
Chapter 6	Improvement of learning efficiency by selecting Optimizer
Chapter 7	TTA,Improvement of learning efficiency by Batch Normalization

[PYTHON] Introduction to Deep Learning for the first time (Chainer) Japanese character recognition Chapter 3 [Character recognition using a model]

Preparation

recognition

Handwritten hiragana character recognition

Source code overview