This article is a continuation of Machine Learning ④ Neural Network Implementation (NumPy Only). This article is an article about implementing a neural network using only NumPy. Some parts of the link below do not overlap with the explanation, so please read them as well. Machine learning ① Basics of Perceptron basics Machine learning ② Perceptron activation function Machine learning ③ Introduction and implementation of activation function Machine learning ④ Neural network implementation (NumPy only)
References: O'REILLY JAPAN Deep Learning from scratch Articles so far: Machine learning ① Basics of Perceptron basics Machine learning ② Perceptron activation function Machine learning ③ Introduction and implementation of activation function Machine learning ④ Neural network implementation (NumPy only)
The above figure is the neural network constructed in this article. We will divide it into each layer and build it in order.
Let's express $ (1) a1 $ with a mathematical formula. It can be derived from the sum of bias and weight.
aa
(1)a1 = (1)w1,1\,\,x1 + (1)w1,2\,\,x2+b1
As in the last time, the "weighted sum" of the first layer can be calculated collectively by the following formula.
aa
(1)A=X\,\,(1)W+(1)B
That is,
A=
\begin{pmatrix}\,\,
(1)a1 & (1)a2 & (1)a3\,\,
\end{pmatrix}
X=
\begin{pmatrix}\,\,
x1 & x2\,\,
\end{pmatrix}
B=
\begin{pmatrix}\,\,
b1 & b2 & b3\,\,
\end{pmatrix}
W=
\begin{pmatrix}
\,\, (1)w1,1 & (1)w1,2 & (1)w1,3 \,\, \\
\,\, (1)w2,1 & (1)w2,2 & (1)w2,3 \,\,
\end{pmatrix}
Based on the above information, I would like to execute the above expression using Python's NumPy array. For the weight and bias, mortgage values are entered.
5-1ThreeLayer_NeuralNetwork.py
import numpy as np
def sigmoid_function(x):
return 1 / (1 + np.exp(-x))
X = np.array([1.0, 0.5])
W1 = np.array([[0.1, 0.3, 0.5], [0.2, 0.4, 0.6]])
B1 = np.array([0.1, 0.2, 0.3])
A1 = np.dot(X, W1) + B1
print(A1)
Execution result
[0.3 0.7 1.1]
For the explanation of the program, please refer to Machine Learning ③ Introduction / Implementation of Activation Function.
Next, suppose that the sigmoid function is adopted as the activation function. Then, it can be said that it becomes as follows.
5-2ThreeLayer_NeuralNetwork_activation_function.py
import numpy as np
def sigmoid_function(x):
return 1 / (1 + np.exp(-x))
X = np.array([1.0, 0.5])
W1 = np.array([[0.1, 0.3, 0.5], [0.2, 0.4, 0.6]])
B1 = np.array([0.1, 0.2, 0.3])
A1 = np.dot(X, W1) + B1
#Adaptation of sigmoid function
Z1 = sigmoid_function(A1)
print(A1)
print(Z1)
Execution result
[0.3 0.7 1.1]
[0.57444252 0.66818777 0.75026011]
It can be confirmed that it is within the range of 0-1 as described in the previous Machine learning ③ Introduction / implementation of activation function. With this momentum, we will implement from 1st layer to 2nd layer.
5-3ThreeLayer_NeuralNetwork_cmp.py
import numpy as np
def sigmoid_function(x):
return 1 / (1 + np.exp(-x))
#Input value
X = np.array([1.0, 0.5])
#Weight of the first layer (numerical value is appropriate)
W1 = np.array([[0.1, 0.3, 0.5], [0.2, 0.4, 0.6]])
#Weight of the second layer (numerical value is appropriate)
W2 = np.array([[0.1, 0.4], [0.2, 0.5], [0.3, 0.6]])
#1st layer bias
B1 = np.array([0.1, 0.2, 0.3])
#Second layer bias
B2 = np.array([0.1, 0.2])
A1 = np.dot(X, W1) + B1
#Adaptation of sigmoid function
Z1 = sigmoid_function(A1)
A2 = np.dot(Z1, W2) + B2
Z2 = sigmoid_function(A2)
print(A1)
print(Z1)
print(A2)
print(Z2)
Execution result
[0.3 0.7 1.1]
[0.57444252 0.66818777 0.75026011]
[0.51615984 1.21402696]
[0.62624937 0.7710107 ]
I wrote it quickly, but if you have any questions, please comment. ..
It seems that the activation function of the output layer (output node) is generally divided by the process that you want to solve by machine learning. The author is currently writing this article as a review, but I will delve into the design of the output layer later. Now let me simply write that the result of the output layer can be replaced with the result you actually want, and it is common to select the activation function from a different direction than the hidden layer accordingly. ..
To distinguish between the activation function of the hidden layer and the activation function of the output layer, the activation function of the output layer is set as $ σ () $. (In the activation function of the hidden layer, put it as $ h () $.) Also, this time, we will use an identity function (a function that outputs the input value as it is) for $ σ () $ in order to explicitly distinguish $ σ () $ and $ h () $. The above contents are illustrated.
Define and implement identity_function
in $ σ () $ this time.
5-4NeuralNetwork_identityf.py
import numpy as np
def sigmoid_function(x):
return 1 / (1 + np.exp(-x))
def identity_function(x):
return x
#Input value
X = np.array([1.0, 0.5])
#Weight of the first layer (numerical value is appropriate)
W1 = np.array([[0.1, 0.3, 0.5], [0.2, 0.4, 0.6]])
#Weight of the second layer (numerical value is appropriate)
W2 = np.array([[0.1, 0.4], [0.2, 0.5], [0.3, 0.6]])
#Weight of the third layer (numerical value is appropriate)
W3 = np.array([[0.1, 0.3], [0.2, 0.4]])
#1st layer bias
B1 = np.array([0.1, 0.2, 0.3])
#Second layer bias
B2 = np.array([0.1, 0.2])
#Third layer bias
B3 = np.array([0.1, 0.2])
A1 = np.dot(X, W1) + B1
#Adaptation of sigmoid function
Z1 = sigmoid_function(A1)
A2 = np.dot(Z1, W2) + B2
Z2 = sigmoid_function(A2)
A3 = np.dot(Z2, W3) + B3
Y = identity_function(A3)
print(A1)
print(Z1)
print(A2)
print(Z2)
print(A3)
print(Y)
Execution result
[0.3 0.7 1.1]
[0.57444252 0.66818777 0.75026011]
[0.51615984 1.21402696]
[0.62624937 0.7710107 ]
[0.31682708 0.69627909]
[0.31682708 0.69627909]
You can see that the output is working. The last two outputs are equivalent because they use the identity function.
Since it was an additional program, I will summarize it at the end. (There is no change in the processing content.)
5-4NeuralNetwork_identityf.py
import numpy as np
def sigmoid_function(x):
return 1 / (1 + np.exp(-x))
def identity_function(x):
return x
def init_data():
data = {}
#Weight of the first layer (numerical value is appropriate)
data['W1'] = np.array([[0.1, 0.3, 0.5], [0.2, 0.4, 0.6]])
#Weight of the second layer (numerical value is appropriate)
data['W2'] = np.array([[0.1, 0.4], [0.2, 0.5], [0.3, 0.6]])
#Weight of the third layer (numerical value is appropriate)
data['W3'] = np.array([[0.1, 0.3], [0.2, 0.4]])
#1st layer bias
data['B1'] = np.array([0.1, 0.2, 0.3])
#Second layer bias
data['B2'] = np.array([0.1, 0.2])
#Third layer bias
data['B3'] = np.array([0.1, 0.2])
return data
def run(data,x):
W1, W2, W3 = data['W1'], data['W2'], data['W3']
B1, B2, B3 = data['B1'], data['B2'], data['B3']
A1 = np.dot(X, W1) + B1
Z1 = sigmoid_function(A1)
A2 = np.dot(Z1, W2) + B2
Z2 = sigmoid_function(A2)
A3 = np.dot(Z2, W3) + B3
Y = identity_function(A3)
return Y
NN_data = init_data()
#Input value
X = np.array([1.0, 0.5])
Y = run(NN_data, X)
print(Y)
We actually built a 3-layer neural network. I would like to develop it for future learning based on this construction method.
Recommended Posts