[PYTHON] Deep Learning from scratch 4.4.2 Gradient for neural networks The question about the numerical_gradient function has been solved.

Contents of this article

The content is ["Deep Learning from scratch"](https://www.amazon.co.jp/%E3%82%BC%E3%83%AD%E3%81%8B%E3%82%89%E4% BD% 9C% E3% 82% 8BDeep-Learning-% E2% 80% 95Python% E3% 81% A7% E5% AD% A6% E3% 81% B6% E3% 83% 87% E3% 82% A3% E3 % 83% BC% E3% 83% 97% E3% 83% A9% E3% 83% BC% E3% 83% 8B% E3% 83% B3% E3% 82% B0% E3% 81% AE% E7% 90 % 86% E8% AB% 96% E3% 81% A8% E5% AE% 9F% E8% A3% 85-% E6% 96% 8E% E8% 97% A4-% E5% BA% B7% E6% AF % 85 / dp / 4873117585) 4.4.2 Gradient for neural network (per p.110). Since the question has been resolved, I will write an article.

Question

What I was wondering about was the code at the bottom of p.111. (following)

>>> def f(W):
... return net.loss(x, t)
...
>>> dW = numerical_gradient(f, net.W)
>>> print(dW)
[[ 0.21924763 0.14356247 -0.36281009]
[ 0.32887144 0.2153437 -0.54421514]]

I have defined the function f and passed it as an argument to the numerical_gradient function defined shortly before this book. When I changed the second argument of this numerical_gradient function to an appropriate value, the value of dW changed. (following)

#Net as the second argument.Specify W.(net.For W, p in this book.See the commentary from 110.)
>>> dW = numerical_gradient(f, net.W) 
>>> print(dW)
[[ 0.06281915  0.46086202 -0.52368118]
 [ 0.09422873  0.69129304 -0.78552177]]

#Store the numpy array in a and specify it as the second argument.
>>> a = np.array([[0.2, 0.1, -0.3],
         [0.12, -0.17, 0.088]])
>>> dW = numerical_gradient(f, a)
>>> print(dW)
[[0. 0. 0.]
 [0. 0. 0.]]

I didn't understand why the value of dW changed. This article is the answer to this question.

Why wondered

I will explain why I was wondering why the value of dW changed.

Point 1

First of all, the return value of this f function has nothing to do with the value of the argument W. This is because W does not appear after return in the f function. So no matter what value you change the argument W of the f function, the return value will not change at all. (See below)

#Specify 3 as the argument of the f function.
>>> f(3) 
2.0620146712373737


#net to f function.Specify W.
>>> f(net.W) 
2.0620146712373737


#Define a numpy array and assign it to a. Compare when a is passed to the f function and when 3 is passed.
>>> a = np.array([[0.2, 0.1, -0.3], 
         [0.12, -0.17, 0.088]]) 
>>> f(a) == f(3) 
True

Point 2

As another point, I will present the numerical_gradient function. (following)

def numerical_gradient(f, x):
    h = 1e-4 # 0.0001
    grad = np.zeros_like(x)
    
    it = np.nditer(x, flags=['multi_index'], op_flags=['readwrite'])
    while not it.finished:
        idx = it.multi_index
        tmp_val = x[idx]
        x[idx] = tmp_val + h
        fxh1 = f(x) # f(x+h)
        
        x[idx] = tmp_val - h 
        fxh2 = f(x) # f(x-h)
        grad[idx] = (fxh1 - fxh2) / (2*h)
        
        x[idx] = tmp_val #Restore the value
        it.iternext()   
        
    return grad

This function returns the grad defined in the function. If you follow from the bottom of the code how this grad is derived, You can find the code grad [idx] = (fxh1 --fxh2) / (2 * h). So what are fxh1 and fxh2? You can find the code fxh1 = f (x) fxh2 = f (x).

Summary of points 1 and 2

From point 2, you can think that the return value grad of the numerical_gradient function is due to the value of f (x). From point 1, the f function returns a constant value regardless of the value of the argument. From points 1 and 2, no matter what value you assign to the second argument x of the numerical_gradient function I thought it was strange that the return value of the numerical_gradient function would change.

Solution

First, let's take a closer look at the numerical_gradient function. And let's take a closer look at the function f.

Learn more about the numerical_gradient function

A little tweak to the numerical_gradient code. Specifically, under fxh1 = f (x) fxh2 = f (x) respectively Enter print (fxh1) print (fxh2). (following)

def numerical_gradient(f, x):
    h = 1e-4 
    grad = np.zeros_like(x)
    
    it = np.nditer(x, flags=['multi_index'], op_flags=['readwrite'])
    while not it.finished:
        idx = it.multi_index
        print('idx:', idx)
        tmp_val = x[idx]
        x[idx] = tmp_val + h
        fxh1 = f(x) # f(x+h)
        print('fxh1:', fxh1) # print(fxh1)Enter
        
        x[idx] = tmp_val - h 
        fxh2 = f(x) # f(x-h)
        print('fxh2:', fxh2) # print(fxh2)Enter
        
        grad[idx] = (fxh1 - fxh2) / (2*h)
        
        x[idx] = tmp_val 
        it.iternext()   
        
    return grad

Now let's move the code by changing the second argument.

Substitute net.W as the second argument

>>> dW = numerical_gradient(f, net.W)
fxh1: 2.062020953321506
fxh2: 2.0620083894906935
fxh1: 2.062060757760379
fxh2: 2.061968585355599
fxh1: 2.061962303319411
fxh2: 2.062067039554999
fxh1: 2.062024094490122
fxh2: 2.062005248743893
fxh1: 2.062083801262337
fxh2: 2.0619455426551796
fxh1: 2.061936119510309
fxh2: 2.06209322386368

Substitute your own numpy array ʻa` for the second argument

>>> a = np.array([[0.2, 0.1, -0.3],
         [0.12, -0.17, 0.088]])
>>> dW = numerical_gradient(f, a)
fxh1: 2.0620146712373737
fxh2: 2.0620146712373737
fxh1: 2.0620146712373737
fxh2: 2.0620146712373737
fxh1: 2.0620146712373737
fxh2: 2.0620146712373737
fxh1: 2.0620146712373737
fxh2: 2.0620146712373737
fxh1: 2.0620146712373737
fxh2: 2.0620146712373737
fxh1: 2.0620146712373737
fxh2: 2.0620146712373737

If you substitute net.W for the second argument, the values of fxh1 and fxh2 are slightly different. On the other hand, when you substitute your own numpy array ʻa, fxh1 and fxh2have the same value. Why? From now on, I will explain by considering the case wherenet.W` is put in the second argument.

Let's take a closer look at the numerical_gradient function. There is the following code in the middle. This code changes the index number of ʻidx, retrieves the xof that index number, and A smallh is added to the extracted xand assigned to thef` function.

it = np.nditer(x, flags=['multi_index'], op_flags=['readwrite'])
    while not it.finished:
        idx = it.multi_index #Change the index number of idx
        tmp_val = x[idx]     #Take out the x of that index number
        x[idx] = tmp_val + h #Add a small h to the extracted x
        fxh1 = f(x) # f(x+h) #Assigned to the f function.

Did the return value of the f function change due to the addition of a small h to x? However, the fact that the return value of the f function does not change due to changes in the arguments is shown in point 1 of" Why did you wonder? "

Actually, there is a part that changed by adding a small h to x. The x here is the net.W assigned to the second argument of the numerical_gradient function. After adding a small h to net.W, it is passed to the argument of the f function. The following part of the numerical_gradient function shown earlier.

x[idx] = tmp_val + h #Add a small h to the extracted x
fxh1 = f(x)          #Assigned to the f function.

The important thing here is the order in which the f function is called after the net.W has changed. How does the change in net.W affect the f function?

Explain the f function in a little more detail

Let's see how the change in net.W affects the f function. The f function is shown below.

def f(W):
    return net.loss(x, t)

The loss function that appears in the f function is defined in the simpleNet class defined on p.110 of this manual. The simpleNet class is shown below.

import sys, os
sys.path.append(os.pardir)  
import numpy as np
from common.functions import softmax, cross_entropy_error
from common.gradient import numerical_gradient


class simpleNet:
    def __init__(self):
        self.W = np.random.randn(2,3)

    def predict(self, x):
        return np.dot(x, self.W)

    def loss(self, x, t):
        z = self.predict(x)
        y = softmax(z)
        loss = cross_entropy_error(y, t)

        return loss

You will see the loss function at the bottom of simpleNet. Inside the loss function is the predict function. The predict function is defined just above the loss function. If you take a closer look at the predict function, you will see the weight parameter W.

At the end of "Learn more about the numerical_gradient function ", what effect does the change in net.W have on the f function? The answer is here. By changing net.W, the weight parameter W of the predict function called by the loss function in the f function has changed. Then, of course, the return value of the loss function will change.

Summary

The explanation is finally over.

Let's get back to the numerical_gradient function. The numerical_gradient function is shown below again.

def numerical_gradient(f, x):
    h = 1e-4 
    grad = np.zeros_like(x)

    it = np.nditer(x, flags=['multi_index'], op_flags=['readwrite'])
    while not it.finished:
        idx = it.multi_index
        print('idx:', idx)
        tmp_val = x[idx]
        x[idx] = tmp_val + h
        fxh1 = f(x) # f(x+h)
    
        x[idx] = tmp_val - h 
        fxh2 = f(x) # f(x-h)
        print('fxh2:', fxh2)

        grad[idx] = (fxh1 - fxh2) / (2*h)

        x[idx] = tmp_val 
        it.iternext()   

    return grad

As mentioned above, the return value of the loss function in the f function changes due to the change of net.W. In this code, adding a small h to x ( net.W) changed the function f, and the value of fxh1 changed. The same applies to the subsequent fxh2. Then passed to subsequent code, the numerical_gradient function prints the return value. This solved the first question I asked.

The important point is ● Suppress that the second argument of the numerical_gradient function, x, is net.W. ● The return value of the f function changed as the net.W changed.

Let's continue reading "Deep Learning from scratch"!

Recommended Posts

Deep Learning from scratch 4.4.2 Gradient for neural networks The question about the numerical_gradient function has been solved.
[Deep Learning from scratch] Main parameter update methods for neural networks
[Deep Learning from scratch] About the layers required to implement backpropagation processing in a neural network
[Deep Learning from scratch] About hyperparameter optimization
Prepare the environment for O'Reilly's book "Deep Learning from scratch" with apt-get (Debian 8)
[Deep Learning from scratch] Speeding up neural networks I explained back propagation processing
[Deep Learning from scratch] Initial value of neural network weight using sigmoid function
[Deep Learning from scratch] I implemented the Affine layer
Deep Learning from scratch
Deep Learning from scratch 4.3.3 Draw a gradient vector of your own function based on the sample code of partial differential.
[Deep Learning from scratch] Initial value of neural network weight when using Relu function
Chapter 3 Neural Network Cut out only the good points of deep learning made from scratch
[Deep Learning from scratch] I tried to explain the gradient confirmation in an easy-to-understand manner.
Create an environment for "Deep Learning from scratch" with Docker
Deep Learning from scratch 1-3 chapters
Lua version Deep Learning from scratch Part 6 [Neural network inference processing]
Deep learning from scratch (cost calculation)
Deep Learning memos made from scratch
Realize environment construction for "Deep Learning from scratch" with docker and Vagrant
[Deep Learning from scratch] Layer implementation from softmax function to cross entropy error
Python vs Ruby "Deep Learning from scratch" Chapter 4 Implementation of loss function
[Learning memo] Deep Learning made from scratch [Chapter 7]
Deep learning from scratch (forward propagation edition)
Deep learning / Deep learning from scratch 2-Try moving GRU
Deep learning / Deep learning made from scratch Chapter 6 Memo
[Learning memo] Deep Learning made from scratch [Chapter 6]
"Deep Learning from scratch" in Haskell (unfinished)
Deep learning / Deep learning made from scratch Chapter 7 Memo
[Windows 10] "Deep Learning from scratch" environment construction
Learning record of reading "Deep Learning from scratch"
About data expansion processing for deep learning
"Deep Learning from scratch" Self-study memo (Part 12) Deep learning
[Learning memo] Deep Learning made from scratch [~ Chapter 4]
Python vs Ruby "Deep Learning from scratch" Chapter 3 Implementation of 3-layer neural network
Deep Learning from scratch The theory and implementation of deep learning learned with Python Chapter 3
[For beginners] After all, what is written in Deep Learning made from scratch?