1. 1. What is recognition by artificial intelligence?

In recent years, the field of artificial intelligence has made remarkable progress. An algorithm called a neural network is generally used to recognize and judge things by artificial intelligence. This is an attempt to imitate the workings of the human brain, as it automatically learns from an engineering approach. This time, as part of that, let's make the computer recognize handwritten numbers so that humans can recognize them naturally.

2. Image recognition by neural network

A neural network is a learning model that imitates the human cranial nerves. It is a substitute that can produce very complex results from given data by performing a large number of non-linear transformations. In the field of image recognition, it can be said that image recognition is realized by applying a correlation process called template matching to a neural network. (Although there may be some misunderstandings, it is just a correlation process in terms of convolution from pixels and brightness.)

The image to be recognized this time is the handwritten number of MNIST.

train-images-idx3-ubyte.gz: training set images (9912422 bytes)
train-labels-idx1-ubyte.gz: training set labels (28881 bytes)
t10k-images-idx3-ubyte.gz: test set images (1648877 bytes)
t10k-labels-idx1-ubyte.gz: test set labels (4542 bytes)

Let's download. Since it is not image data that can be used as it is after downloading, Let's convert it to image data by referring to the following documents on the official website.

TRAINING SET LABEL FILE (train-labels-idx1-ubyte):

[offset] [type] [value] [description] 0000 32 bit integer 0x00000801(2049) magic number (MSB first) 0004 32 bit integer 60000 number of items 0008 unsigned byte ?? label 0009 unsigned byte ?? label ........ xxxx unsigned byte ?? label The labels values are 0 to 9.

TRAINING SET IMAGE FILE (train-images-idx3-ubyte):

[offset] [type] [value] [description] 0000 32 bit integer 0x00000803(2051) magic number 0004 32 bit integer 60000 number of images 0008 32 bit integer 28 number of rows 0012 32 bit integer 28 number of columns 0016 unsigned byte ?? pixel 0017 unsigned byte ?? pixel ........ xxxx unsigned byte ?? pixel Pixels are organized row-wise. Pixel values are 0 to 255. 0 means background (white), 255 means foreground (black).

You can see that the teacher signal is stored after offset 8 and the training data is stored after offset 16. Therefore, if you open it as a binary file and extract the byte string for each pixel value (28 * 28), you can get the image file. By properly binarizing, you can obtain the following image data of handwritten numbers for 60000 data. The test data added to this will be used in the training of the neural network.

3. 3. Multi-layer neural network model

The neural network model used this time is defined by the following requirements.

--One input layer, two hidden layers, one output layer --All activation functions are sigmoid --Learning rate is fixed at 0.1 --The number of learnings is fixed at 30,000 --There are 10 types of training data (0-9), and 100 sheets are used for each type. --No other special algorithms such as momentum are used

It's a simple neural network, but the point is that it uses two hidden layers. There are many things that need to be improved, but for now, let's put this into practice.

The code implementation is shown below.

`MultiLayerPerceptron_MNIST.py`


# 1. Preprocess
import random
# 1.1. Make NeuralLetwork
# 1.1.1. Define Layers
n_hidden = 2
n_layer = n_hidden + 2

# 1.1.2. Define Units
n_unit_i = 7 * 7 + 1
n_unit_h = 20 # all 
n_unit_o = 10

unit_i = [0 for u in range(n_unit_i)]
unit_h1 = [0 for u in range(n_unit_h)]
unit_h2 = [0 for u in range(n_unit_h)]
unit_o = [0 for u in range(n_unit_o)]

# 1.1.3. Initialize weight
w1 = [[random.uniform(-1, 1) for u_before in range(n_unit_i)] for u_after in range(n_unit_h)]
w2 = [[random.uniform(-1, 1) for u_before in range(n_unit_h)] for u_after in range(n_unit_h)]
w3 = [[random.uniform(-1, 1) for u_before in range(n_unit_h)] for u_after in range(n_unit_o)]

# 1.2. Define dataset

import mydatasets

n_data = 100
ipt = mydatasets.inputdata("digit")
res = ipt.load_data2(size=(7, 7), num=n_data)

train = res[0][0]
for k in train:
    for n in k:
        n.insert(0, 1)

def maketeach(kind):
    buf = []
    for o in range(n_unit_o):
        if o == kind:
            buf.append(1) 
        else:
            buf.append(0)
    return buf
    
# 1.3 Implement forward propagation
import math

# 1.3.1 Define activation fucntion
def sigmoid(z):
    if z > 10: return 0.99999
    elif z < -10: return 0.00001
    else: return 1 / (1 + math.exp(-1 * z))
    
# 1.3.2 forward propagation

def forward(train_vec):
    
    for i in range(n_unit_i):
        unit_i[i] = train_vec[i]
    unit_i[0] = 1
        
    # 1.3.2.1 forward between input-hidden1
    for h1 in range(n_unit_h):
        buf = 0
        for i in range(n_unit_i):
            buf += unit_i[i] * w1[h1][i]
        unit_h1[h1] = sigmoid(buf)
    unit_h1[0] = 1
        
    # 1.3.2.2 forward between hidden1-hidden2
    for h2 in range(n_unit_h):
        buf = 0
        for h1 in range(n_unit_h):
            buf += unit_h1[h1] * w2[h2][h1]
        unit_h2[h2] = sigmoid(buf)
    unit_h2[0] = 1

    # 1.3.2.3 forward between hidden2-output
    for o in range(n_unit_o):
        buf = 0
        for h2 in range(n_unit_h):
            buf += unit_h2[h2] * w3[o][h2]
        unit_o[o] = sigmoid(buf)

# 1.3.3 back propagation

alpha = 0.1
def backpropagation(teach_vec):
    
    # 1.3.3.1 get cost
    buf = 0
    for o in range(n_unit_o):
        buf += (teach_vec[o] - unit_o[o]) ** 2
    cost = buf / 2
    
    # 1.3.3.2 get grad between hidden2-output
    for o in range(n_unit_o):
        for h2 in range(n_unit_h):
            delta = (unit_o[o] - teach_vec[o]) * unit_o[o] * (1 - unit_o[o]) * unit_h2[h2]
            w3[o][h2] -= alpha * delta
            
    # 1.3.3.3 get grad
    for o in range(n_unit_o):
        for h2 in range(n_unit_h):
            for h1 in range(n_unit_h):
                delta = ((unit_o[o] - teach_vec[o]) * unit_o[o] * (1 - unit_o[o])
                         * w3[o][h2] * unit_h2[h2] * (1 - unit_h2[h2]) * unit_h1[h1])
                w2[h2][h1] -= alpha * delta
                
    # 1.3.3.4 get grad
    for o in range(n_unit_o):
        for h2 in range(n_unit_h):
            for h1 in range(n_unit_h):
                for i in range(n_unit_i):
                    delta = ((unit_o[o] - teach_vec[o]) * unit_o[o] * (1 - unit_o[o])
                             * w3[o][h2] * unit_h2[h2] * (1 - unit_h2[h2])
                             * w2[h2][h1] * unit_h1[h1] * (1 -unit_h1[h1]) * unit_i[i])
                    w1[h1][i] -= alpha *delta
                    
    return cost
    
import matplotlib.pyplot as plt
plt_x = []
plt_y = []

n_epoch = 30
n_train = len(train)
n_kind = 10
n = 0
error_threshold = 0.001

print("Backpropagation training is started now.")

def training(n):
    for e in range(n_epoch):
        for d in range(n_data):
            for k in range(n_kind):
                try:
                    n += 1
                    forward(train[k][d])
                    c = backpropagation(maketeach(k))
                    plt_x.append(n)
                    plt_y.append(c)
                    if n % 100 == 0:
                        print("learn num: {0}".format(n))
                    if c < error_threshold and e > n_epoch // 2:
                        print("cost is least than error threshold. (n: {})".format(n))
                        return 1
                except Exception as e:
                    print("n:{}, d:{}, k{}, Error:{}".format(n, d, k, e.args))
                    pass
    return 0

def forecast(train_data, dim):
    forward(train_data)
    res = unit_o
    n = 0
    for r in res:
        if r == max(res):
            max_score = n
        n += 1
    print("max score : {}".format(max_score))
    print("scores is below : ")
    print(res)
    
    import numpy as np
    import cv2
    mat = []
    row = []
    cnt = 0
    n = 0
    for t in range(1, len(train_data)):
        row.append(train_data[t])
        cnt += 1
        n += 1
        if cnt == 7:
            #print("if statement is called at n:{}".format(n))
            mat.append(row)
            row = []
            cnt = 0
    cv2.imwrite('forecast_input.png', np.array(mat)*255)
    
    return max_score
    
def validation(vaild_sets):
    n_dim = 7
    correct = 0
    incorrect = 0
    n_kind = 10
    for d in range(n_data):
        for k in range(n_kind):
            if forecast(train[k][d], n_dim) == k:
                correct += 1
            else:
                incorrect += 1
    total = correct + incorrect
    print("validation result:: correct answer is {} / {}".format(correct, total))

training(n)
plt.plot(plt_x, plt_y)
plt.xlim(0, 30000)
plt.ylim(0.0, 1.51)
plt.show()

valid = res[1][0]
for k in valid:
    for n in k:
        n.insert(0, 1)
        
validation(valid)

4. Execution result

Taking a graph of the number of learnings-error, the following results were obtained.

At the beginning, the neural network cannot distinguish between 0 (correct answer) and 1 (wrong answer), and makes a half-finished judgment of 0.5. As you learn, you will be able to discriminate more and more, and if the result is correct, you will be able to make a judgment close to 0, and if the result is incorrect, you will be able to make a judgment close to 1.

This is the result of using sigmoid for the output activation function. sigmoid has the effect of normalizing the value in the range of 0 to 1, which indicates the success or failure of recognition in the form of probability.

As a result of validation, the recognition rate of handwritten numbers was 84.7%. If you can get such a result with the above primitive algorithm, you can say that you have succeeded in recognizing handwritten numbers for the time being.

4.5. Execution result postscript (16/06/06)

I was pointed out that the meaning of the execution result is difficult to read, so I will add an explanation.

First, let's make the range of execution results finer.

ダウンロード (2).png

The figure above is a graph of the error rate up to trial 29970 ... 30000. You can see the periodicity from this graph. The periodicity for each type of number (0 ... 9), such as low error when the ones digit is 0 or 1, and high error when the ones digit is 9.

This vibration is due to the learning algorithm. In the algorithm used this time, Shioume, such as learning 1 after learning 0, is made to learn the following types of numbers each time. Therefore, the error rate fluctuates greatly depending on the number of trials. There is no direct connection to successive trials, but to trials based on 10. You are learning the recognition of 0 by trying 0 → 10 → 20 → 30, and learning the recognition of 1 by trying 1 → 11 → 21.

The average recognition rate differs depending on each number. At some point, some models may recognize 0 well, and 9 may not. That is why the perceptions vary and appear to vibrate.

And as the learning progresses, the model learns what is called an "ideal value". The error rate will also increase. This is because when the ideal value is solidified, the degree of deviation of the input value becomes clearer than in the neutral state. Let's look at one example of the ideal value and the degree of dissociation.

The figure above shows the result of entering one part of the 0 validation set and its error rate. The error rate is the last part of the file name underscore delimiter, before the extension. The index number of the image is before the error rate. In this image, the 93rd input image shows the highest error rate (0.4708). The model learned that it was hard to recognize that it was 0 when we saw this dull shape. Also, in the image above, No. 98 has the lowest error rate, so it is learned that 0 tilted to the right is the closest to the ideal form of 0.

5. Conclusion

This time I showed a simple implementation of a neural network. However, with the current algorithm, there are still dissatisfactions in terms of the number of learnings and errors. Next time, I will solve this dissatisfaction with several approaches.

[PYTHON] Recognition of handwritten numbers by multi-layer neural network