Chapter 7 [Neural Network Deep Learning] P252 ~ 275 (first half) [Learn by moving with Python! New machine learning textbook]

[Reference] [Learn by running with Python! New machine learning textbook]: https://www.amazon.co.jp/Python%E3%81%A7%E5%8B%95%E3%81%8B%E3%81%97%E3%81%A6%E5%AD%A6%E3%81%B6%EF%BC%81-%E3%81%82%E3%81%9F%E3%82%89%E3%81%97%E3%81%84%E6%A9%9F%E6%A2%B0%E5%AD%A6%E7%BF%92%E3%81%AE%E6%95%99%E7%A7%91%E6%9B%B8-%E4%BC%8A%E8%97%A4-%E7%9C%9F/dp/4798144983

What is explained in Chapter 7

-Create a prediction model for classification using the cross entropy error from the maximum likelihood estimation explained in Chapter 6 using a neural network. Create a model with three or more input values x (input in Chapter 6). I only did 3 class classification of value 2D input.)

First, as in the previous time, look at the total input a.

As in Chapter 6, consider the number of input dimensions when it is 2 (D = 2). $ a=w_0x_0+w_1x_1+w_2 $

If you add the bias parameter $ x_2 , which always takes 1 to represent the intercept $ a=w_0x_0+w_1x_1+w_2x_2 $$

Since it can be expressed by the summation formula, $ a=\sum_{i=0}^{2}w_ix_i $ This is the same as last time, but it can be used as a probability through the sigmoid function. $ y=\frac{1}{1+exp(-a)} $ ** By passing through the sigmoid function, the value of a obtained from the input changes in the range of 0 to 1. (Since the probability also changes in the range of 0 to 1, this can be used as the probability distribution.) **

In Chapter 6, the distribution of this transition from 0 to 1 was expressed as a probability, but in the neural network in this chapter, it is considered that the value in the range of 0 to 1 represents the ** firing frequency **.

(P253) Here, the output value is considered to represent the number of pulses per unit time, that is, ** firing frequency **. The larger a is, the closer the firing frequency is to the limit of the firing frequency, and conversely, the larger the value of a is negative, the closer the firing frequency is to 0, and the more the firing frequency is almost non-existent. I think.

7.2 Neural network model

2-layer feedforward neural network

Two-layer neural network model iOS の画像 (3).jpg Two-dimensional inputs can be divided into three categories. Represents the probability that each output value belongs to each category. ** For example, when there are two input values of weight ($ x_0 ) and height ( x_1 $), if you enter the height and weight of each person A and B, where is that person 3 Each t indicates whether it belongs to a class. ** **

ex)
class
$t_0$=Black
$t_1$=Caucasian
$t_2$=Asian

Mr. A:
Weight: 90kg
Height: 189 cm
If you type in this model.
$t_0$=0.95
$t_1$=0.04
$t_2$=0.01
The sum of t is 1
If it comes out, it is highly likely that you are black

Mr. B:
Weight: 65kg
Height: 168 cm
If you type in this model.
$t_0$=0.06
$t_1$=0.04
$t_2$=0.90
If you say, it's likely that you're Asian.

Mechanism of each probability:

It is easy to understand by looking at the figure on page 239 of Chapter 6.

iOS の画像 (4).jpg

・ Maximum likelihood estimation is used to obtain the values of $ w_0, w_1, and w_2 $ for each training data input. ・ And since the sum total of $ a_0, a_1, a_2 $ can be obtained. -Each output value y is expressed as a probability by a sigmoid function. Probability that t = 0 for $ y_0 $ Probability that t = 1 for $ y_1 $ Probability that t = 2 for $ y_2 $

Each is determined by the value of the input value somewhere from 0 to 1, so classification is possible. ** (I don't know why the sum of t is 1.) **

image.png

image.png

--In the middle layer, each input value x is weighted by w and passed to the middle layer b. --In the middle layer b, the sum of each input value and dummy variable is shown, and the sum is passed through the sigmoid function so that it can be used as the probability (z). --After that, in the same way, multiply each probability z of the intermediate layer by the weight v again, and take the sum again in the output layer. --This time, the output value can be used as a probability through the softmax function instead of the sigmoid function.

P258

Total input of middle layer: $ b_j=\sum_{i=0}^Dw_{ji}x_i $

Intermediate layer output: $ z_j = h (b_j) \ hspace {25pt} h () is a sigmoid function $

Total input of output layer: $ a_k=z\sum_{j=0}^{M}v_{kj}z_j $

Output layer output: $ y_k=\frac{exp(a_k)}{\sum_{l=0}^{K-1}exp(a_l)}=\frac{exp(a_k)}{u} $

Numerical differential method

The cross entropy error is the probability that a certain input (x = 5.8g, etc.) produces a certain probability (T = [1,0,0]) and that T = [1,0,0]. Error when calculated by maximum likelihood estimation. From the input value, you can create a model that outputs the probability. By inputting a new input value to the model, it outputs where the input value is classified.

The cross entropy error of the two-layer feedforward network is: $ E(w,v)=-\frac{1}{N}\sum_{n=0}^{N-1}\sum_{k=0}^{K-1}t_{nk}log(y_{nk}) $

image.png

Although it is a diagram on page 267, what is said on this page is the same as this page.

First, what we want to do is that this E (w) is the cross entropy error that takes the log-likelihood from the maximum likelihood estimation, and takes the value w (weight) that has the smallest valley but the smallest error.

The slope is 0 because we want w to find the output t when the probability is found from this most input. I want the value w of this valley bottom (the y-axis at the time of this valley bottom is the probability, and the maximum likelihood estimation is multiplied by -1 and reversed, so the most plausible probability of the maximum likelihood estimation = this valley bottom. ).

** Therefore, $ w ^ * $ in the figure is the optimum weight w when creating a classification model. ** **

What are you saying here? ・ It's difficult to calculate partial differential. ・ If you find the value just a little ahead of $ w ^ * $ and the value just a little before, even if you don't bother to calculate the partial differential, you can find a straight line that passes through two points, so it looks like a slope similar to a slope. You can get the value ・ That is the formula (7-19)

So, 7-19 talks about when the parameter w is one, but the formula when it is extended to multiple is (7-20). ** Therefore, even when there are multiple parameters, the appropriate ws can be easily obtained. ** **

image.png

Finally, how to read this graph on page 269: (Probably) Since the partial differential values for each of the weight parameters of w and v are given, the closer this value is to 0, the smaller the slope becomes. Therefore, 4 can be set as a good parameter for w, and 8 can be set as a good parameter for v.

I think it means.

First half last

image.png

(P373 and 273 are read as they are, so they are omitted.)

In the above figure, a model is created based on the weights of w and v obtained on the previous page, and the drawing is performed when the test data is actually input.

Since w and v are required when the error of each class 1, 2 and 3 is small, the part with high probability when the actual input value is entered is defined as the range of 0.5 to 0.9. The part of $ t_0 ~ t_2 $ in the image. If you display contour lines only where each probability is high and divide them, it seems that classification is possible.

image.png

Recommended Posts

Chapter 7 [Neural Network Deep Learning] P252 ~ 275 (first half) [Learn by moving with Python! New machine learning textbook]
Chapter 6 Supervised Learning: Classification pg212 ~ [Learn by moving with Python! New machine learning textbook]
Chapter 7 [Error back propagation method] P275 ~ (Middle) [Learn by moving with Python! New machine learning textbook]
Learn by running with new Python! Machine learning textbook Makoto Ito numpy / keras Attention!
[Python / Machine Learning] Why Deep Learning # 1 Perceptron Neural Network
Python learning memo for machine learning by Chainer Chapter 13 Neural network training ~ Chainer completed
Python & Machine Learning Study Memo ③: Neural Network
Python learning memo for machine learning by Chainer Chapter 13 Basics of neural networks
Python vs Ruby "Deep Learning from scratch" Chapter 3 Implementation of 3-layer neural network
Python learning memo for machine learning by Chainer from Chapter 2
Try to build a deep learning / neural network with scratch
Python sample to learn XOR with genetic algorithm with neural network
Python learning memo for machine learning by Chainer Chapter 7 Regression analysis
[Deep learning] Image classification with convolutional neural network [DW day 4]
Neural network with Python (scikit-learn)
Machine learning with Python! Preparation
Beginning with Python machine learning
Python learning memo for machine learning by Chainer Chapter 8 Introduction to Numpy
Python learning memo for machine learning by Chainer Chapter 10 Introduction to Cupy
Machine Learning with docker (42) Programming PyTorch for Deep Learning By Ian Pointer
A memorandum of scraping & machine learning [development technique] by Python (Chapter 4)
A memorandum of scraping & machine learning [development technique] by Python (Chapter 5)
Python learning memo for machine learning by Chainer Chapter 9 Introduction to scikit-learn
Neural network with OpenCV 3 and Python 3
Machine learning with python (1) Overall classification
Machine learning summary by Python beginners
PRML Chapter 5 Neural Network Python Implementation
Introduction to Deep Learning for the first time (Chainer) Japanese character recognition Chapter 2 [Model generation by machine learning]
"Scraping & machine learning with Python" Learning memo
Python vs Ruby "Deep Learning from scratch" Chapter 2 Logic circuit by Perceptron
Python learning memo for machine learning by Chainer until the end of Chapter 2
Amplify images for machine learning with python
Machine learning with python (2) Simple regression analysis
[Shakyo] Encounter with Python for machine learning
[Python] First data analysis / machine learning (Kaggle)
Python & Machine Learning Study Memo ④: Machine Learning by Backpropagation
Build AI / machine learning environment with Python
Python standard library: First half (Python learning memo ⑧)
Deep Learning from scratch The theory and implementation of deep learning learned with Python Chapter 3
Machine Learning with docker (40) with anaconda (40) "Hands-On Data Science and Python Machine Learning" By Frank Kane