Neural network implementation in python

Introduction

I implemented a 3-layer neural network with python and tried to identify XNOR. I have also posted mathematical formulas, so if you are interested, please read it. I used ["Deep Learning"](http://www.amazon.co.jp/ Deep Learning-Machine Learning Professional Series-Okatani-Takayuki / dp / 4061529021) as a textbook.

Structure of this article

neural network

A neural network is a model that imitates the neural circuits of the human brain. Image recognition and voice recognition are possible by using this model. The network implemented this time has a three-layer structure consisting of an input layer, an intermediate layer (1 layer), and an output layer.

Weight update

I will explain with the figure below.

neu.png

Let the weight from the $ i $ th unit in the $ l-1 $ layer to the $ j $ th unit in the $ l $ layer be $ w_ {ji} ^ {(l)} $. Also, let $ u_ {i} ^ {(l-1)} $ be the value held by the $ i $ th unit in the $ l-1 $ layer. The sigmoid function $ g (x) $ is used as the activation function. It is convenient because the differential value can be obtained using the value before differentiation.

g(x) = \cfrac{1}{1 + \exp(-x)} \\
g'(x) = g(x)(1 - g(x))

The square error $ E_n $ is used as the objective function, and the weight is updated by the following formula to minimize it. $ E_n $ is the error caused by one sample, and the method of updating the weight using this $ E_n $ is called the stochastic gradient descent method. The stochastic gradient descent method has the advantage that it is less likely to be trapped in the local solution because it changes each time the objective function is updated. $ \ Epsilon $ is called the learning rate and is a parameter that determines the learning speed. The point is how to calculate the term of partial differential.

{{w}_{ji}^{(l)}}_{new} = {w_{ji}^{(l)}}_{old} - \epsilon\cfrac{\partial E_n}{\partial {w_{ji}^{(l)}}_{old}}

Error back propagation

Now, I will explain how to find $ \ cfrac {\ partial E_n} {\ partial w_ {ji} ^ {(l)}} $. Since the method of obtaining the output layer and the intermediate layer are slightly different, we will explain them separately.

** ** The teacher is represented by $ t_j $. Partial differentiation of the squared error $ E_n $ with respect to the output layer weight $ w_ {ji} ^ {(l)} $

\begin{align}
\cfrac{\partial E_n}{\partial w_{ji}^{(l)}} &= \cfrac{\partial}{\partial w_{ji}^{(l)}}\cfrac{1}{2}(t_j - g(u_{j}^{(l)}))^{2} \\
&= (t_j - g(u_{j}^{(l)}))\cdot\cfrac{\partial}{\partial w_{ji}^{(l)}}(t_j - g(u_{j}^{(l)})) \\
&= (t_j - g(u_{j}^{(l)}))\cdot\cfrac{\partial}{\partial u_{j}^{(l)}}(t_j - g(u_{j}^{(l)}))\cdot\cfrac{\partial u_{j}^{(l)}}{\partial w_{ji}^{(l)}} \\
\\
&= (g(u_{j}^{(l)}) - t_j)\cdot g'(u_{j}^{(l)})\cdot g(u_{i}^{(l-1)}) \\
\\
&= (g(u_{j}^{(l)}) - t_j)g(u_{j}^{(l)})(1 - g(u_{j}^{(l)}))g(u_{i}^{(l-1)})
\end{align}

** ** Partial differentiation of the squared error $ E_n $ with respect to the intermediate layer weight $ w_ {ji} ^ {(l)} $

\begin{align}
\cfrac{\partial E_n}{\partial w_{ji}^{(l)}} &= \cfrac{\partial E_n}{\partial u_{j}^{(l)}}\cfrac{\partial u_{j}^{(l)}}{\partial w_{ji}^{(l)}} \\
\\
&= \delta_{j}^{(l)}\cfrac{\partial u_{j}^{(l)}}{\partial w_{ji}^{(l)}}
\end{align}

The first term on the right side is

\begin{align}
\delta_{j}^{(l)} &= \cfrac{\partial E_n}{\partial u_{j}^{(l)}} \\
&= \sum_{k}\cfrac{\partial E_n}{\partial u_{k}^{(l+1)}}\cfrac{\partial u_{k}^{(l+1)}}{\partial u_{j}^{(l)}} \\
\\
&= \sum_{k}\delta_{k}^{(l+1)}w_{kj}^{(l+1)}g'(u_{j}^{(l)}) \\
\\
&= \Bigl(\sum_{k}\delta_{k}^{(l+1)}w_{kj}^{(l+1)}\Bigr)g(u_{j}^{(l)})(1 - g(u_{j}^{(l)}))
\end{align}

The second term on the right side is

\begin{align}
\cfrac{\partial u_{j}^{(l)}}{\partial w_{ji}^{(l)}} &= g(u_{i}^{(l-1)})
\end{align}

From the above,

\begin{align}
\cfrac{\partial E_n}{\partial w_{ji}^{(l)}} &= \delta_{j}^{(l)}\cfrac{\partial u_{j}^{(l)}}{\partial w_{ji}^{(l)}} \\
\\
&= \Bigl(\sum_{k}\delta_{k}^{(l+1)}w_{kj}^{(l+1)}\Bigr)g(u_{j}^{(l)})(1 - g(u_{j}^{(l)}))g(u_{i}^{(l-1)})
\end{align}

When updating the weight of the middle layer $ l $, $ \ delta_ {k} ^ {(l + 1)} $ of the next layer $ l + 1 $ is required. $ \ Delta $ in the output layer can be obtained as the difference between the output value and the teacher, and by propagating this in order from the output layer to the input layer, the weight of the intermediate layer is updated. This is the reason why it is called error back propagation.

XNOR

XNOR outputs $ 1 $ when the input values are the same, and outputs $ 0 $ when the input values are different. It cannot be linearly identified.

x_1 x_2 t
0 0 1
0 1 0
1 0 0
1 1 1
xnor.png

Implementation in python

We have implemented a neural network that identifies XNORs. The number of intermediate layers is one, and the number of units in the intermediate layer is two. The learning rate is $ \ epsilon = 0.1 $ and the momentum coefficient is $ \ mu = 0.9 $. (Momentum is one of the methods to improve the convergence performance, and the weight correction amount is added by multiplying the previous weight correction amount by a coefficient.)

neuralnetwork.py


import numpy
import math
import random
from matplotlib import pyplot

class Neural:

	# constructor
	def __init__(self, n_input, n_hidden, n_output):
		self.hidden_weight = numpy.random.random_sample((n_hidden, n_input + 1))
		self.output_weight = numpy.random.random_sample((n_output, n_hidden + 1))
		self.hidden_momentum = numpy.zeros((n_hidden, n_input + 1))
		self.output_momentum = numpy.zeros((n_output, n_hidden + 1))


# public method
	def train(self, X, T, epsilon, mu, epoch):
		self.error = numpy.zeros(epoch)
		N = X.shape[0]
		for epo in range(epoch):
			for i in range(N):
				x = X[i, :]
				t = T[i, :]

				self.__update_weight(x, t, epsilon, mu)

			self.error[epo] = self.__calc_error(X, T)


	def predict(self, X):
		N = X.shape[0]
		C = numpy.zeros(N).astype('int')
		Y = numpy.zeros((N, X.shape[1]))
		for i in range(N):
			x = X[i, :]
			z, y = self.__forward(x)

			Y[i] = y
			C[i] = y.argmax()

		return (C, Y)


	def error_graph(self):
		pyplot.ylim(0.0, 2.0)
		pyplot.plot(numpy.arange(0, self.error.shape[0]), self.error)
		pyplot.show()


# private method
	def __sigmoid(self, arr):
		return numpy.vectorize(lambda x: 1.0 / (1.0 + math.exp(-x)))(arr)


	def __forward(self, x):
		# z: output in hidden layer, y: output in output layer
		z = self.__sigmoid(self.hidden_weight.dot(numpy.r_[numpy.array([1]), x]))
		y = self.__sigmoid(self.output_weight.dot(numpy.r_[numpy.array([1]), z]))

		return (z, y)

	def __update_weight(self, x, t, epsilon, mu):
		z, y = self.__forward(x)

		# update output_weight
		output_delta = (y - t) * y * (1.0 - y)
		_output_weight = self.output_weight
		self.output_weight -= epsilon * output_delta.reshape((-1, 1)) * numpy.r_[numpy.array([1]), z] - mu * self.output_momentum
		self.output_momentum = self.output_weight - _output_weight

		# update hidden_weight
		hidden_delta = (self.output_weight[:, 1:].T.dot(output_delta)) * z * (1.0 - z)
		_hidden_weight = self.hidden_weight
		self.hidden_weight -= epsilon * hidden_delta.reshape((-1, 1)) * numpy.r_[numpy.array([1]), x]
		self.hidden_momentum = self.hidden_weight - _hidden_weight


	def __calc_error(self, X, T):
		N = X.shape[0]
		err = 0.0
		for i in range(N):
			x = X[i, :]
			t = T[i, :]

			z, y = self.__forward(x)
			err += (y - t).dot((y - t).reshape((-1, 1))) / 2.0

		return err

main.py


from neuralnetwork import *

if __name__ == '__main__':

	X = numpy.array([[0, 0], [0, 1], [1, 0], [1, 1]])
	T = numpy.array([[1, 0], [0, 1], [0, 1], [1, 0]])
	N = X.shape[0] # number of data

	input_size = X.shape[1]
	hidden_size = 2
	output_size = 2
	epsilon = 0.1
	mu = 0.9
	epoch = 10000

	nn = Neural(input_size, hidden_size, output_size)
	nn.train(X, T, epsilon, mu, epoch)
	nn.error_graph()

	C, Y = nn.predict(X)

	for i in range(N):
		x = X[i, :]
		y = Y[i, :]
		c = C[i]

		print x
		print y
		print c
		print ""

result

It can be identified correctly.

x_1 x_2 t y
0 0 [ 1, 0 ] [ 0.92598739, 0.07297757 ]
0 1 [ 0, 1 ] [ 0.06824915, 0.93312514 ]
1 0 [ 0, 1 ] [ 0.06828438, 0.93309010 ]
1 1 [ 1, 0 ] [ 0.92610205, 0.07220633 ]

I made a graph showing how the error is decreasing. The horizontal axis is the number of epochs and the vertical axis is the error.

error.png

in conclusion

We implemented a 3-layer neural network and were able to identify the XNOR. There is still more to study, such as learning tricks (how to determine the initial value of the weight, how to determine the learning rate, AdaGrad, momentum), but this is the end.

Recommended Posts

Neural network implementation in python
PRML Chapter 5 Neural Network Python Implementation
RNN implementation in python
ValueObject implementation in Python
SVM implementation in python
Implementation of quicksort in Python
Neural network implementation (NumPy only)
Simple neural network implementation using Chainer
Neural network with OpenCV 3 and Python 3
Sorting algorithm and implementation in Python
HMM parameter estimation implementation in python
Mixed normal distribution implementation in python
Implementation of a two-layer neural network 2
Implementation of life game in Python
Simple neural network theory and implementation
Implementation of original sorting in Python
Try building a neural network in Python without using a library
Quadtree in Python --2
CURL in python
Metaprogramming in Python
Python 3.3 in Anaconda
Geocoding in python
SendKeys in Python
PRML Chapter 5 Mixed Density Network Python Implementation
Meta-analysis in Python
Unittest in python
Implementation module "deque" in queue and Python
Implementation of 3-layer neural network (no learning)
Discord in Python
DCI in Python
quicksort in python
nCr in python
N-Gram in Python
Programming in python
Plink in Python
Constant in python
Lifegame in Python.
Sqlite in python
StepAIC in Python
Implementation of "blurred" neural network using Chainer
Simple neural network implementation using Chainer-Data preparation-
N-gram in python
Implemented in Python PRML Chapter 5 Neural Networks
LINE-Bot [0] in Python
Csv in python
Disassemble in Python
Reflection in Python
Constant in python
nCr in Python.
format in python
Scons in Python3
Python & Machine Learning Study Memo ③: Neural Network
Puyo Puyo in python
python in virtualenv
PPAP in Python
Quad-tree in Python
Reflection in Python
Simple neural network implementation using Chainer-Model description-
Chemistry in Python
Hashable in python
DirectLiNGAM in Python