[PYTHON] Neural network to understand and implement in high school mathematics

Introduction

This article explains the mechanism of neural networks from understanding and implementation using the concept of high school level mathematics. I will confirm by actually implementing only how the forward and backward calculations are performed without explaining the theory in detail (there is no explanation about learning such as the gradient method and optimization). Use Python + Numpy for implementation. Basically numpy writes code as if it were imported. Please note that this article focuses on intuitive understanding and may mislead theories and theorems. Also, if you generalize it, the subscripts may be confused and you may not understand it, so basically we will understand it with concrete examples.

About mathematics used for explanation

In order to understand and implement neural networks, vector inner products, matrix products, and some differential formulas are required, so I will explain them briefly. If you have a solid understanding of high school math, you can skip it.

Inner product of vectors

Consider the following two vectors.


\vec{x} = ({x_1, x_2, x_3}),\:\vec{w} = ({w_1, w_2, w_3})

The inner product of this vector $ \ vec {x} $ and $ \ vec {w} $ is defined below.


\vec{w}・\vec{x} = (w_1x_1+w_2x_2+w_3x_3)

The story is simple, just multiply and add each corresponding element. Using numpy, the dot product can be written as follows.

x = np.random.randn(3)
w = np.random.randn(3)
np.dot(w,x)

Matrix product

It seems that high school mathematics does not currently deal with matrices, but it is easier to write and implement using matrices, so please do your best to understand it. However, all you need to know is that the matrix is an array of vectors, and how the multiplication and addition is done, so be patient and remember. Consider the following two matrices.


X=\left(\begin{matrix}
x_{11} & x_{12} & x_{13} \\
x_{21} & x_{22} & x_{32}
\end{matrix}\right),\:
W=\left(\begin{matrix}
w_{11} & w_{12} \\
w_{21} & w_{22} \\
w_{31} & w_{32}
\end{matrix}\right)

The multiplication of these two matrices $ X $ and $ W $ is defined below.


WX = \left(\begin{matrix}
w_{11}x_{11}+w_{21}x_{12}+w_{31}x_{13} & w_{12}x_{11}+w_{22}x_{12}+w_{32}x_{13} \\
w_{11}x_{21}+w_{21}x_{22}+w_{31}x_{23} & w_{12}x_{21}+w_{22}x_{22}+w_{32}x_{23}
\end{matrix}\right)

It's a little confusing, but the first row of $ X $ {$ x_ {11}, x_ {12}, x_ {13} $} and the first column of $ W $ {$ w_ {11}, w_ {21}, The inner product of the vectors of w_ {31} $} is in the first row and first column, and the second row of $ X $ {$ x_ {21}, x_ {22}, x_ {23} $} and the first column of $ W $. The inner product of the vectors of the eyes is the value of the nth row and the first column, and the inner product of the vectors of the nth row of the previous matrix and the mth column of the back matrix is the value of the nth row and mth column. Note that you can only multiply if the number of rows in the front matrix is equal to the number of columns in the back matrix. The matrix product is expressed in numpy as follows.

X = np.random.randn(2,3)
W = np.random.randn(3,2)
np.dot(X,W)

Also, if the number of rows and the number of columns do not match, an error will occur.

X = np.random.randn(3,2)
W = np.random.randn(3,2)
np.dot(X,W) #error
X = np.random.randn(1,4)
W = np.random.randn(4,2)
np.dot(X, W) #Computable

Differentiation formula

Below are some of the differential formulas used this time.

f(x)=x+4\:\:\:---->\:\:f'(x)=1\\
f(x)=1/x\:\:\:---->\:\:f'(x)=-1/x^2\\
f(x)=4x\:\:\:---->\:\:f'(x)=4\\
f(x)=\exp(x)\:\:---->\:\:f'(x)=\exp(x)

What is a neural network?

Aside from the complications, a neural network expresses a specific output by multiplying an input by a numerical value called weight and adding a numerical value called bias as shown in the figure below. Screenshot 2017-03-11 20.22.50.png The formula above is $ o = wx + b $. A function like a linear function, which has the same form as the linear function learned in junior high school mathematics, is called a linear function. In an actual neural network, the inputs and outputs range from hundreds to thousands, and the formula is $ o = w_ {1} x_ {1} + w_ {2} x_ {2} + ... + w_ {n} It looks like x_ {n} + b $. Furthermore, neural networks express complex functions by using non-linear functions (curve functions such as quadratic functions) for this output. We will learn this weight and bias so that the neural network will output appropriate output according to the problem. You can see what you can do specifically by looking at stanford's convnetjs. As an example, see demo: toy 2d classification with 2-layer neural network in convnetjs. The figure below shows the training results of a neural network trained to solve the problem of separating the two-dimensional green data and red data. Screenshot 2017-03-11 20.34.52.png <img width =" 421 "alt =" Screenshot 2017-03-11 23.59.36.png "src =" https://qiita-image-store.s3.amazonaws.com /0/123951/f672a1d4-c560-2f3b-3ec6-3e6bc592e925.png "> スクリーンショット 2017-03-11 23.59.51.png

In the neural network, the weight and bias values are determined by learning, and the function of the boundary line that separates the red area and the green area is expressed by repeating the linear combination and the nonlinear function. The functions that can be expressed vary depending on the number of neurons and how to select the activation function (explained later). In order, the number of intermediate neurons is 6 and the activation function is tanh, the number of intermediate neurons is 2 and the activation function is tanh, and the number of intermediate neurons is 6 and the activation function is relu. As an image, as the number of neurons increases, the number of straight lines increases, and if you use tanh for the activation function, the joints between the straight lines will be round, and if you use relu, it will be sharp. The reason for this will not be mentioned in this article, so please study in a textbook.

Forward propagation calculation

First of all, let's understand Forward. I think the forward propagation calculation is much simpler and easier to understand than the back propagation calculation. The forward propagation calculation refers to $ o = wx + b $ performed in the previous part. First, consider a 1-input, 1-output neural network. <Img width = "679" alt = "Screenshot 2017-03-11 20.22.50.png " src = "https://qiita-image-store.s3.amazonaws.com/0" in the previous figure for easy viewing / 123951/2e99484b-2c69-76a7-a5b6-439680a543c6.png "> To スクリーンショット 2017-03-11 20.43.53.png It is expressed as. There will be no bias in the future for ease of viewing and understanding. I think it is not necessary to write 1 input and 1 output, but it is a simple implementation as follows.

one2one.py


x = np.random.randn(1)
w = np.random.randn(1)
w*x

Extend this to multiple inputs and 1 output. スクリーンショット 2017-03-11 20.46.40.png This calculation can be expressed as $ o = w_1x_1 + w_2x_2 + w_3x_3 $. Since this formula is the inner product of vectors itself, it can be implemented as follows.

many2one.py


x = np.random.randn(1,3)
w = np.random.randn(3,1)
np.dot(x,w)

Next, consider one input and multiple outputs. スクリーンショット 2017-03-11 20.50.58.png This can be expressed as $ o_1 = w_1x, o_2 = w_2x, o_3 = w_3x $. This defines o and w as $ \ vec {o} = {o_1, o_2, o_3}, \ vec {w} = {w_1, w_2, w_3} $ and $ \ vec {o} = x \ vec It can be rewritten as {w} $. This can be implemented as follows:

one2many.py


x = np.random.randn(1,1)
w = np.random.randn(1,3)
np.dot(x,w)

Finally, consider multiple inputs and multiple outputs. スクリーンショット 2017-03-11 20.58.46.png This is an expression $ o_1 = w_ {11} x_1 + w_ {21} x_2 + w_ {31} x_3 $, $ o_2 = w_ {12} x_1 + w_ {22} x_2 + w_ {32} x_3 $, $ o_3 = w_ {13} x_1 + w_ {23} x_2 + w_ {33} x_3 $. Converted to a vector, $ \ vec {x} = {x_1, x_2, x_3} $, $ \ vec {w_1} = {w_ {11}, w_ {21}, w_ {31}} $, $ \ vec {w_2 } = {w_ {12}, w_ {22}, w_ {32}} $, $ \ vec {w_3} = {w_ {13}, w_ {23}, w_ {33}} $, which is calculated as $ o_1 = \ vec {w_1} \ vec {x} $, $ o_2 = \ vec {w_2} \ vec {x} $, $ o_3 = \ vec {w_3} \ vec {x} $. This can be expressed using a matrix product, and each value is represented by a matrix as follows.

X=\begin{matrix} x_1 & x_2 & x_3 \end{matrix},
W=\begin{matrix} w_{11} & w_{12} & w_{13} \\
w_{21} & w_{22} & w_{23} \\
w_{31} & w_{32} & w_{33} \end{matrix},
O=\begin{matrix} o_1 & o_2 & o_3\end{matrix}

Then, it can be expressed as $ O = XW $ as a calculation. I was able to express it quite neatly. If you are not familiar with the procession, please move your hands and check it. The implementation of this is as follows.

many2many.py


X=np.random.randn(1,3)
W=np.random.randn(3,3)
np.dot(X,W)

This is the end of forward propagation, but in reality we add biases. Finally, if you connect this to multiple layers as shown below, you will have a multi-layer neural network. スクリーンショット 2017-03-11 21.14.04.png Basically, the output O obtained in the previous layer is regarded as the output X of the neural network in the next layer, and the same calculation is performed.

multi_layer.py


X=np.random.randn(1,3)
layer1_W=np.random.randn(3,2)
layer2_W=np.random.randn(2,1)
layer1_O=np.dot(X,layer1_W)
layer2_O=np.dot(layer1_O, layer2_W)

Weight and bias updates

Next to forward propagation, we will understand how to learn weight and bias values. First, let the neural network be one function such as $ f (X) = w_1x_1 + w_2x_2 + ... + w_nx_n $ (however, $ X = {x_1, x_2, ..., x_n} $). On the other hand, the weight is updated using the following formula.

w_i^{new}=w_i-lr\frac{\partial{f(x)}}{\partial{w_i}}

$ lr $ is called the Learning Rate, which controls how much the value is changed. If the learning rate is too large or too small, learning will not work. In the context of neural networks in recent years, values often range from 0.1 to 0.0001, but basically it depends on experience and feeling because we do not know what value is appropriate. $ \ frac {\ partial {f (x)}} {\ partial {w_i}} $ is the derivative of $ f (X) $ with respect to $ w_i $, and the above $ f (x) $ definition is $. \ frac {\ partial {f (x)}} {\ partial {w_i}} = x_i $. For bias, just replace $ w $ with $ b $ in the same calculation. I will not explain why the weights are updated so that the output value becomes the target value. Although it is the most important part, if you have an intuitive understanding of only the important part at the level of high school mathematics, problems may occur, so if you have the opportunity, I will explain it in another article. Returning to the main subject, the above derivative can be calculated very easily in the case of a single-layer neural network, but not in the commonly used multi-layer neural network. As I wrote at the beginning, the neural network performs the transformation using the nonlinear function after the linear combination described in the forward propagation section. To use a non-linear function is to calculate as follows.

o = w_1x_1+w_2x_2+w_3x_3,\:g(x)=\frac{1}{1+\exp(-x)},\:o'=g(o)=\frac{1}{1+\exp(-o)}=\frac{1}{1+\exp(-(w_1x_1+w_2x_2+w_3x_3))}

By the way, the above $ g (x) $ is a non-linear function called sigmoid function, which is often used in neural networks (recently, it is not often used except in the output layer). The nonlinear function that affects the output of the neural network in this way is called the activation function. It's a little difficult to differentiate this final output $ o'$ for a particular $ w_1 $, but you can still do your best to calculate it. However, consider the case where this is multi-layered as shown in the figure below. スクリーンショット 2017-03-11 21.46.45.png If you write this in a mathematical formula, it will be as follows.


o_{1} = w_{11}x_1+w_{21}x_2+w_{31}x_3,\:o_{2} = w_{12}x_1+w_{22}x_2+w_{32}x_3,\\

o'_{1}=g(o_1),\:o'_{2}=g(o_2),\\
o=w'_1o'_1+w'_2o'_2,\:o'=g(o)

If you write the final $ o'$ from the beginning, it will be as follows.

o'=\frac{1}{1+\exp(-(w'_1\frac{1}{1+\exp(-(w_{11}x_1+w_{21}x_2+w_{31}x_3)}+w'_2\frac{1}{1+\exp(-(w_{12}x_1+w_{22}x_2+w_{32}x_3)})}

For example, it is very difficult to differentiate this for $ w_ {11} $. This is a tremendous calculation when it spans dozens of layers like a deep neural network. In order to calculate this effectively, the back propagation calculation introduced below is used.

Backpropagation calculation

The concept of the Chain Rule is important for performing backpropagation calculations. The chain rule says that the following formula holds.


\frac{\partial{f(x)}}{\partial{w}}=\frac{\partial{f(x)}}{\partial{o}}\frac{\partial{o}}{\partial{w}}

That is, when trying to differentiate $ f (X) $ with respect to a variable called $ w $, it can be expressed by multiplying $ f (x) $ with respect to $ o $ and $ o $ with respect to $ w $. That is. It feels like it will be the same as the original if you reduce $ \ partial {o} $ as an image. For example, apply it to the following (it can be solved without using the chain rule).

o=w+z,\:f(x)=(w+z)y=oy,\\
\frac{\partial{o}}{\partial{w}}=1,\:\frac{\partial{f(x)}}{\partial{o}}=y,\:\frac{\partial{f(x)}}{\partial{w}}=\frac{\partial{f(x)}}{\partial{o}}\frac{\partial{o}}{\partial{w}}=y

The above calculation can be calculated without using the chain rule, but it is a very effective calculation method when applied to a multi-layered one using the sigmoid function. Next, we will perform backpropagation calculation on the actual neural network. First, consider a simple 1-input, 1-output neural network. スクリーンショット 2017-03-11 22.12.27.png

This neural network can be expressed as $ f (x) = wx + b $, and let's say $ A = wx $ as an intermediate output. $ \ Frac {\ partial {f (x)}} {\ partial {w}} $ needed to update weights as a goal and $ \ frac {\ partial {f (x)} needed to update bias Consider computing} {\ partial {b}} $. First, calculate $ \ frac {\ partial {f (x)}} {\ partial {b}} $. Since it can be expressed as $ f (x) = A + b $, the derivative of $ f (x) $ with respect to $ b $ is $ \ frac {\ partial {f (x)}} {\ partial {b}} = It will be 1 $. Then calculate $ \ frac {\ partial {f (x)}} {\ partial {w}} $. Using the chain rule, $ \ frac {\ partial {f (x)}} {\ partial {w}} = \ frac {\ partial {f (x)}} {\ partial {A}} \ frac {\ partial It can be expressed as {A}} {\ partial {w}} $, and $ \ frac {\ partial {f (x)}} {\ partial {A}} = 1 $, $ \ frac {\ partial {A} } {\ partial {w}} = x = 3 $, so $ \ frac {\ partial {f (x)}} {\ partial {w}} = 3 $. The state of back propagation is shown in blue below. スクリーンショット 2017-03-11 22.17.10.png This backpropagation calculation is very powerful. For example, the derivative of the sigmoid function mentioned above can be calculated as follows (try to calculate it).

f(x)=\frac{1}{1+\exp(-x)},\\
f'(x)=f(x)(1-f(x))=\frac{1}{1+\exp(-x)}(1-\frac{1}{1+\exp(-x)})

Then $ \ frac {\ partial {without calculating the derivative of the sigmoid function itself Let's find the value of f (x)}} {\ partial {x}} $ by backpropagation calculation. The sigmoid function is expressed as follows. スクリーンショット 2017-03-12 10.40.37.png Here, $ D $ represents the sigmoid function ($ D = f (x) $). We will calculate the back propagation.

\frac{\partial{D}}{\partial{C}}=(\frac{1}{C})'=\frac{-1}{C^2}=\frac{-1}{(1.37)^2}=-0.53\\
\frac{\partial{D}}{\partial{B}}=\frac{\partial{D}}{\partial{C}}\frac{\partial{C}}{\partial{B}}=(-0.53)(1+B)'=(-0.53)(1)=-0.53\\
\frac{\partial{D}}{\partial{A}}=\frac{\partial{D}}{\partial{B}}\frac{\partial{B}}{\partial{A}}=(-0.53)(\exp(A))'=(-0.53)(\exp(A))=(-0.53)(0.37)=0.2\\
\frac{\partial{D}}{\partial{x}}=\frac{\partial{D}}{\partial{A}}\frac{\partial{A}}{\partial{x}}=(-0.2)(-x)'=(-0.2)(-1)=0.2

If you actually substitute $ x = 1 $ for $ f'(x) = f (x) (1-f (x)) $, which is the differential of the sigmoid function, you should get the same result (valid). Please note that it is calculated with 2 digits). Let's get back to the neural network. Next, consider the following back propagation of a multi-layer neural network. スクリーンショット 2017-03-11 22.40.38.png It will be difficult to see, but the expression will be corrected as follows. スクリーンショット 2017-03-11 22.46.50.png

The back propagation calculation of this neural network is performed. If you omit the intermediate process and show only the result, it will be as follows. Please actually calculate. スクリーンショット 2017-03-11 22.46.01.png The underlined value is $ \ frac {\ partial {f (x)}} {\ partial {w_i}} $ that is actually used to update the parameter. Learning can be performed by multiplying the value calculated in this way by the learning rate and subtracting it from the original value. Finally, let's implement a series of flow of forward propagation, back propagation, and parameter update. This time, I'm implementing a bias-free one, but please try to implement a biased one yourself to check if you understand it. Define the following nn class.

nn.py


class nn():
  def __init__(self, n_i, n_o, lr):
    self.weight = np.random.randn(n_o, n_i)
    self.input = None
    self.grad = np.zeros((n_i, n_o))
    self.lr = lr

  def forward(self, x):
    self.inputs = x.reshape(-1, 1)  #Hold input value
    return np.dot(self.weight, self.inputs)

  def backward(self, dx):
    self.grad = np.dot(self.inputs.reshape(-1,1), dz.reshape(1,-1)).reshape(self.weight.shape) #Derivative calculation for w
    return np.dot(dz.reshape(1, -1), self.weight) #Derivative calculation for x

  def update(self):
    self.weight -= self.grad*self.lr

Regarding forward, it is the same implementation as implemented by forward propagation. The value of the input is retained because it is needed to calculate the derivative of w during the backpropagation calculation. As you can see by actually calculating the previous example by hand, the derivative value for w is the value of the input x multiplied by the derivative value propagated from before. In backward, matrix multiplication is used to calculate the derivative. If you write down each calculation as you did in the forward propagation, you can see that the derivatives for all w can be calculated at once by matrix multiplication. Similarly, the derivative value related to x can be calculated by the matrix product of the derivative value and the weight propagated from before. If you check if you can actually calculate with the inner product, you can see if you understand it, so please check it. If the derivative value (grad) of w can be calculated, the weight can be updated by multiplying the learning rate and subtracting from the original value.

test.py


from nn import nn
fc = nn(10, 2, 0.1)
x = np.random.randn(10, 1)
fc.forward(x)
grad = np.random.randn(1,2) #Derivative value coming from the front
fc.backward(grad)
fc.update()

You can update the weight as above. In the above, the differential value that comes from the front is generated by random numbers, but in reality it is calculated using an error function that calculates the error from the target value. When actually building a neural network model, an activation function (non-linear function) is inserted between layers. This time I will implement the sigmoid function. As I wrote before, both forward and backward have formulas, and backward can be implemented as follows because it is calculated using the calculation result of forward propagation calculation.

sigmoid.py


class sigmoid():
  def __init__(self):
    self.output = None

  def forward(self, x):
    self.output = 1/(1+np.exp(-x))
    return self.output

  def backward(self, dx):
    return self.output * (1 - self.output)

It can be implemented simply by writing the previously written expression as it is. Since the sigmoid function has no parameters to learn, it is not necessary to hold the update function or the derivative value. If you actually make a model of a multi-layer neural network using this sigmoid function, it will be as follows.

multi_layer_perceptron.py


from nn import nn
from sigmoid import sigmoid
fc1 = nn(10, 5, 0.1)
sig1 = sigmoid()
fc2 = nn(5, 2, 0.1)
sig2 = sigmoid()
x = np.random.randn(10, 1)
sig2.forward(fc2.forward(sig1.forward(fc1.forward(x))))
grad = np.random.randn(1,2)
fc1.backward(sig1.backward(fc2.backward(sig2.backward(grad))))
fc1.update()
fc2.update()

Error function

Finally, I will explain the error function. All that is needed for forward propagation, back propagation and network training is how to update the weights so that the output of the neural network approaches the target value. Until now, the input to output of the neural network was represented by a single function called $ f (x) $, which was differentiated with respect to $ w $. However, even if the weight is updated with this value, it will only be updated to a random value and will not be useful. Therefore, at the end of the neural network, a function called an error function that expresses the difference from the target value is bitten, and the function from the input to the error function is considered as one function called $ f (x) $. By updating $ f (x) $ including this error function with a differentiated value, the weight can be updated so that the output value approaches the target value. In machine learning, the model is trained using an error function that matches the problem. This time, we will use the mean squared error for ease of implementation and understanding. Mean squared error is one of the error functions used for regression and is defined as follows.

f(x)=\frac{1}{n}\sum_{i=1}^n(y_i-x_i)^2\\
\frac{\partial{f(x)}}{\partial{x_i}} = \frac{2}{n}(y_i-x_i)

Where $ x_i $ is the i-th output of the network and $ y_i $ is the target value corresponding to the i-th output. Neural networks aim to minimize the value of the set error function. Therefore, if you use the mean square error, the weights are updated so that the difference between the values of $ x_i $ and $ y_i $ is zero, that is, they are exactly the same value. The following is an implementation example.

MSE.py


class MSE():
  def forward(self, x, y):
    return np.square(y.reshape(-1)-x.reshape(-1)).mean()

  def backward(self, x, y):
    return 2*(y.reshape(-1) - x.reshape(-1)).mean()

After that, it is possible to solve the regression problem as long as the data is ready. An implementation example is shown below.

train.py


from nn import nn
from sigmoid import sigmoid
from MSE import MSE
fc1 = nn(10, 5, 0.1)
sig1 = sigmoid()
fc2 = nn(5, 2, 0.1)
sig2 = sigmoid()
mse = MSE()
x = np.random.randn(10) #Training data generation
t = np.random.randn(2) #Teacher data generation
for i in range(100):
  out = sig2.forward(fc2.forward(sig1.forward(fc1.forward(x))))
  loss = mse.forward(out, t)
  print(loss)
  grad = mse.backward(out, t)
  fc1.backward(sig1.backward(fc2.backward(sig2.backward(grad))))
  fc1.update()
  fc2.update()

When you run the above code, you will see that the output loss value is getting smaller. This time, the data is generated with random numbers, but if you read the actual data, it will learn suitable for that data. There are various error functions, so please check them out and implement them. By doing so, you will be able to understand the characteristics of each error function, and you will be able to design your own error function that suits the problem you want to solve.

Summary

I tried to explain the understanding and implementation of neural network learning at the level of high school mathematics. Please try it yourself to see if you understand the implementation when there is a neural network bias and the place to prepare and learn the actual data (I'm sorry it was just a hassle to implement the learning part). ). It is possible to explain why neural networks such as gradient disappearance, gradient explosion, parameter initialization, importance of hyperparameters, activation functions other than sigmoid function, CNN, RNN, etc. can be optimized by updating with differential values. There are many, but this time I would like to end at such a place. Please be careful as it is an article written with momentum, so I think there are typographical errors, implementation mistakes, and so on. After that, it is recommended that you actually study a proper theory with books.

Recommended Posts

Neural network to understand and implement in high school mathematics
Implement feedforward neural network in Chainer to classify documents
Implement and understand union-find trees in Go
Implement Convolutional Neural Network
Implement Neural Network from 1
Artificial intelligence, machine learning, deep learning to implement and understand
I tried to implement a basic Recurrent Neural Network model
Implement a 3-layer neural network
Neural network implementation in python
I made my own 3-layer forward propagation neural network and tried to understand the calculation deeply.
[Deep Learning from scratch] About the layers required to implement backpropagation processing in a neural network
Try to implement and understand the segment tree step by step (python)
Network settings and confirmation in CentOS7
Simple neural network theory and implementation
[High school mathematics + python] Logarithmic problem
Understand Cog and Extension in discord.py
I tried to understand the learning function in the neural network carefully without using the machine learning library (second half).
AtCoder C problem summary that can be solved in high school mathematics
[Python] Ikiri high school students will explain blockchain to beginners in 10 minutes! !! !!
I tried to implement PLSA in Python
I tried to implement permutation in Python
I tried to implement PLSA in Python 2
I tried to implement ADALINE in Python
[Python] Pandas to fully understand in 10 minutes
I tried to implement PPO in Python
How to implement Scroll View in pythonista 1
Understand and implement ridge regression (L2 regularization)
How to use is and == in Python
2. Mean and standard deviation with neural network!
[Python] A junior high school student implemented Perceptron and tried to classify irises.