[PYTHON] I made my own 3-layer forward propagation neural network and tried to understand the calculation deeply.

Introduction

Machine learning and deep learning have a rich library, so you can make predictions with a simple copy and paste. I myself have run programs created by many ancestors and have come to a level where I can understand the outline. In particular, deep learning (neural network) is applied to GAN and natural language processing. It is a genre in which new technologies are being created at a dizzying pace, and I believe that its application to society and industry is progressing rapidly. Therefore, we recognize that it is the technology at the center of such transformation and would like to have a very interesting and deep understanding of these areas! There is a motive.

Currently, we are starting to learn from the basics here, which is famous as a deep learning textbook. https://www.oreilly.co.jp/books/9784873117584/ This time, by building a neural network from almost scratch (although I use numpy), I would like to get a real sense of the calculations being performed there.

The summary is below.

Understand the perceptron

A perceptron means that it receives multiple signals as inputs and outputs one signal. In the field of machine learning, not outputting a signal is treated as 0, and outputting a signal is treated as 1.

image.png

The figure above is a simple diagram of this idea. x is the input signal, y is the output signal, and w is the weight. This 〇 is called a neuron. The neuron is sent the sum of the weight multiplied by the input value. At this time, output 1 is output when the sum exceeds the threshold value θ. The formula is as follows.

image.png

Check with the AND circuit

Now, let's actually make a program. I reproduced the simple pattern shown in the figure above. First, let's calculate when the threshold is 0.4.

NN.ipynb



def AND(x1,x2):
    w1,w2,theta = 0.5,0.5,0.4
    tmp = x1*w1 + x2*w2
    b = -0.5
    if tmp <= theta:
        return 0
    elif tmp > theta:
        return 1

python


print(AND(0,0))
print(AND(1,0))
print(AND(0,1))
print(AND(1,1))

0
1
1
1

As a result, it was found that if either x1 or x2 is 1, 1 is output as output. On the other hand, if the threshold is 0.7, it will be as follows.


print(AND(0,0))
print(AND(1,0))
print(AND(0,1))
print(AND(1,1))
0
0
0
1

If only one of x1 and x2 is 1, the output will no longer spit out 1. You can see that the output obtained changes depending on the threshold setting.

Expand to neural network

A multi-layer perceptron is a network that has a layer called an intermediate layer between the input layer and the output layer. The description may differ depending on the book, but in the case of the figure below, the input layer is called 0 layer, the intermediate layer is called 1 layer, and the output layer is called 2 layers. image.png

How to count the layers of a neural network

image.png

Here, there seems to be a style (custom?) In how to count when calling it a 〇-layer neural network. It seems that we may count the weight layer or call the layer of neurons. I don't have much experience with which is more common, but I would like to learn from O'Reilly's textbooks and name it on a weight-based basis.

Understand bias

image.png image.png

Next, we introduce a value called bias b. By rearranging the above equation with the threshold value θ as -b, it is possible to determine the output of y to 0 or 1 based on 0 as in the above equation. Bias is the meaning of a correction value that "geta geta" in the small job industry (manufacturing industry), and it is possible to raise or lower the value on the y-axis as a whole.

Understand the activation function

image.png image.png

The function that determines whether y is 0 or 1 is called the activation function. Since the value obtained by this activation function can be a value near 0 or 1, there is also a function that can prevent the calculation from diverging. There are several types of this activation function.

It is one of the functions often used in the activation function. It is a fraction of the function of the Napier number e, which is the base of the natural logarithm as shown below. The shape of this function is hard to come up with, but it looks like the following.

image.png

NN.ipynb



import numpy as np
def sigmoid(x):
    return 1/(1+np.exp(-x))


xxx = np.arange(-5.0,5.0,0.1)#Show sigmoid function
yyy = sigmoid(xxx)
plt.plot(xxx,yyy)
plt.ylim(-0.1,1.1)
plt.show

001.png

It can be seen that when x> 0 with x = 0 as the boundary, it gradually approaches y = 1. On the contrary, when x <0, it is asymptotic to y = 0. It turns out that the point that the input value can be output between 0 and 1 is very convenient because it plays the role of the activation function.

Next, there is a step function as a function that outputs 0,1 to the extreme of the sigmoid function. This would be written as follows.

NN.ipynb


def step_function(x):
    return np.array(x > 0, dtype=np.int)

x = np.arange(-5.0, 5.0, 0.1)
y = step_function(x)
plt.plot(xxx,yyy)
plt.plot(x, y)
plt.ylim(-0.1, 1.1) 
plt.show()

003.png

Blue is the step function and orange is the sigmoid function. You can see that the output value is only 0 or 1. I have little sense of how to use this function properly, so let me leave it as my homework in the future. ** As a feeling, I understand that the sigmoid function has a function that can distinguish the difference even with a slight input difference because it can take the value more finely. On the other hand, if there are multiple layers and the calculation load is high, I think that it is possible to make a distinction while reducing the load by using the step function as appropriate. ** **

Finally, I would like to talk about the ReLU (Rectified Linear Unit) function, which also has the impression that it is often used. If x exceeds 0, the value is output as y as it is, and if it is 0 or less, 0 is output.

NN.ipynb


def relu(x):
    return np.maximum(0,x)

xx = np.arange(-5.0,5.0,0.1)
yy = relu(xx)
plt.plot(xx,yy)
plt.ylim(-0.1,5)
plt.show

004.png

What is forward propagation type?

This time, we will create a forward propagation type neural network. This forward propagation type indicates that the flow flows from the input to the output in one direction. When thinking about training a model, the calculation is performed from the output to the input. This is called the backpropagation method.

Implement a 3-layer neural network

Now, I would like to actually describe a three-layer neural network.

image.png Consider creating a three-layer neural network as shown in the figure above. First, let's take out only the calculations that stand out in bold in the above figure.

NN.ipynb


def init_network():
    network = {}
    network['W1'] = np.array([[0.1,0.3,0.5],[0.2,0.4,0.6]])
    network['b1'] = np.array([0.1,0.2,0.3])
    return network

def forword(network,x):
    W1= network['W1']
    b1= network['b1']
    
    a1 = np.dot(x,W1)+b1
    z1 = sigmoid(a1)
    
    return y

network = init_network()
x = np.array([2,1])
z1 = forword(network,x)
print(z1)
[0.40442364 0.59557636]

I let the init_network () function define weights and biases, and let the forword () function define the formulas that actually calculate. After that, the function is called and the initial value x is assigned so that the answer can be spit out. It's easier to understand than writing a function without defining it in a row.

** Also, pay attention to the function that represents the inner product of the matrix described here as np.dot. Be careful when describing the product of matrices, because the dimensions of the matrix obtained in the order of multiplication change. ** **

image.png

3-layer neural network

NN.ipynb


def init_network():
    network = {}
    network['W1'] = np.array([[0.1,0.3,0.5],[0.2,0.4,0.6]])
    network['b1'] = np.array([0.1,0.2,0.3])
    network['W2'] = np.array([[0.1,0.4],[0.2,0.5],[0.3,0.6]])
    network['b2'] = np.array([0.1,0.2])
    network['W3'] = np.array([[0.1,0.3],[0.2,0.4]])
    network['b3'] = np.array([0.1,0.2])
    return network

def forword(network,x):
    W1,W2,W3 = network['W1'],network['W2'],network['W3']
    b1,b2,b3 = network['b1'],network['b2'],network['b3']
    
    a1 = np.dot(x,W1)+b1
    z1 = sigmoid(a1)
    a2 = np.dot(z1,W2)+b2
    z2 = sigmoid(a2)
    a3 = np.dot(z2,W3)+b3
    y = softmax(a3)
    
    return y

If you let the two functions describe to the end, it will be written like this. Well, I mentioned earlier here, but there is a description written as softmax at the end. This is summarized below.

Identity function and softmax function

After that, we can see that we should add layers to these two functions. Then, consider the value y to be output at the end. When it is necessary to classify problems such as guessing 0 to 9 types of numbers, the probability corresponding to each type is output, and the one with the highest probability is used as the predicted value. A convenient function to express such a probability is the softmax function. image.png

The sum of the values taken for all items in a certain classification can be used as the denominator, and the individual values taken as the numerator can be used as the value representing the probability. By ending with this softmax function, the classification problem is reduced to the probability, and the highest value is the predicted value.   image.png

In terms of implementation, since it is an exponential function of exp, there is a problem that the value is very easy to diverge. Therefore, it seems that it is often done for convenience that it becomes difficult to diverge by multiplying a certain constant by the denominator and numerator and inserting it into the exponent of exp.

NN.ipynb



def softmax(a):
    c = np.max(a)
    exp_a = np.exp(a-c)
    sum_exp_a = np.sum(exp_a)
    y = exp_a/sum_exp_a
    return y

Enter the initial conditions and output the answer

NN.ipynb


network = init_network()
x = np.array([2,1])
y = forword(network,x)
print(y)
[0.40442364 0.59557636]

As a test, I put an appropriate value in x, and the answer was returned as follows. A value indicating that y1 has a probability of 40% and y2 has a probability of 60% was output. After that, I understand that complicated classifications will become possible as the input matrix becomes larger and the layers become deeper (≈ increase).

At the end

This time, I made a very basic neural network by hand. I deepened my understanding by just moving my hands. ** I finally came to understand the basics of the GAN algorithm that I copied and moved. ** By putting ideas such as model learning and convolution here, it will lead to a convolutional neural network and further to GAN. It may be an introduction to reach the latest technology, but I hope that by steadily deepening my understanding in this way, I will surely improve my technical capabilities.

The full program is here. It is divided into a file that you just played with the function and a file that is a 3-layer neural network. https://github.com/Fumio-eisan/neuralnetwork_20200318

Recommended Posts

I made my own 3-layer forward propagation neural network and tried to understand the calculation deeply.
I tried how to improve the accuracy of my own Neural Network
What I was addicted to when I built my own neural network using the weights and biases I got with scikit-learn's MLP Classifier.
I tried to control the network bandwidth and delay with the tc command
I tried to understand the learning function in the neural network carefully without using the machine learning library (second half).
[Linux] I learned LPIC lv1 in 10 days and tried to understand the mechanism of Linux.
I tried to understand how to use Pandas and multicollinearity based on the Affairs dataset.
I tried to summarize four neural network optimization methods
I tried to understand it carefully while implementing the algorithm Adaboost in machine learning (+ I deepened my understanding of array calculation)
I tried to predict the genre of music from the song title on the Recurrent Neural Network
I tried to implement a basic Recurrent Neural Network model
Neural network to understand and implement in high school mathematics
I tried to illustrate the time and time in C language
I tried to display the time and today's weather w
I tried to enumerate the differences between java and python
I made my own OSS because I wanted to contribute to it
I tried to classify music major / minor on Neural Network
I made my own language. (1)
I made my own language (2)
I made my own AML
I don't really understand the difference between modules, packages and libraries, so I tried to organize them.
I made my own Python library
I tried hard to understand Spectral Normalization and singular value decomposition, which contribute to the stability of GAN.
I tried to move the ball
I tried to estimate the interval.
I made my own research tool using the legal API [Smart Roppo]
Since I made my own smart lock, I will summarize the difficult points
I tried to summarize until I quit the bank and became an engineer
I tried moving the image to the specified folder by right-clicking and left-clicking
I tried to visualize the age group and rate distribution of Atcoder
I tried to express sadness and joy with the stable marriage problem.
[Deep Learning from scratch] I tried to implement sigmoid layer and Relu layer.
I tried to learn the angle from sin and cos with chainer
I made a network to convert black and white images to color images (pix2pix)
I tried to extract and illustrate the stage of the story using COTOHA
I don't have the skills or strength, but I made my own compiler
I tried to verify and analyze the acceleration of Python by Cython
I tried to understand the decision tree (CART) that makes the classification carefully
I implemented the VGG16 model in Keras and tried to identify CIFAR10
Introduction to AI creation with Python! Part 3 I tried to classify and predict images with a convolutional neural network (CNN)
I tried to understand the learning function of neural networks carefully without using a machine learning library (first half).
Introduction to AI creation with Python! Part 2 I tried to predict the house price in Boston with a neural network
I tried to summarize the umask command
I tried to recognize the wake word
I tried to summarize the graphical modeling.
I tried to estimate the pi stochastically
I tried to touch the COTOHA API
I tried my best to return to Lasso
I tried to notify the update of "Hamelin" using "Beautiful Soup" and "IFTTT"
I tried to publish my own module so that I can pip install it
[Ansible] I want to call my own function from the template module (macro)
I tried to make my own source code compatible with Chainer v2 alpha
I made an image classification model and tried to move it on mobile
I tried to process and transform the image and expand the data for machine learning
[LIVE] I tried to deliver the sunrise and sunset times nationwide every day
[Introduction to AWS] I tried porting the conversation app and playing with text2speech @ AWS ♪
I didn't understand the Resize of TensorFlow so I tried to summarize it visually.
I tried to pass the G test and E qualification by training from 50
[No code] I wrote about elliptic curves and blockchain in my thesis, so I tried to summarize the study method.