The content is ["Deep Learning from scratch"](https://www.amazon.co.jp/%E3%82%BC%E3%83%AD%E3%81%8B%E3%82%89%E4% BD% 9C% E3% 82% 8BDeep-Learning-% E2% 80% 95Python% E3% 81% A7% E5% AD% A6% E3% 81% B6% E3% 83% 87% E3% 82% A3% E3 % 83% BC% E3% 83% 97% E3% 83% A9% E3% 83% BC% E3% 83% 8B% E3% 83% B3% E3% 82% B0% E3% 81% AE% E7% 90 % 86% E8% AB% 96% E3% 81% A8% E5% AE% 9F% E8% A3% 85-% E6% 96% 8E% E8% 97% A4-% E5% BA% B7% E6% AF % 85 / dp / 4873117585) 4.4.2 Gradient for neural network (per p.110). Since the question has been resolved, I will write an article.
What I was wondering about was the code at the bottom of p.111. (following)
>>> def f(W):
... return net.loss(x, t)
...
>>> dW = numerical_gradient(f, net.W)
>>> print(dW)
[[ 0.21924763 0.14356247 -0.36281009]
[ 0.32887144 0.2153437 -0.54421514]]
I have defined the function f
and passed it as an argument to the numerical_gradient
function defined shortly before this book.
When I changed the second argument of this numerical_gradient
function to an appropriate value, the value of dW
changed. (following)
#Net as the second argument.Specify W.(net.For W, p in this book.See the commentary from 110.)
>>> dW = numerical_gradient(f, net.W)
>>> print(dW)
[[ 0.06281915 0.46086202 -0.52368118]
[ 0.09422873 0.69129304 -0.78552177]]
#Store the numpy array in a and specify it as the second argument.
>>> a = np.array([[0.2, 0.1, -0.3],
[0.12, -0.17, 0.088]])
>>> dW = numerical_gradient(f, a)
>>> print(dW)
[[0. 0. 0.]
[0. 0. 0.]]
I didn't understand why the value of dW
changed.
This article is the answer to this question.
I will explain why I was wondering why the value of dW
changed.
First of all, the return value of this f
function has nothing to do with the value of the argument W
.
This is because W
does not appear after return
in the f
function.
So no matter what value you change the argument W
of the f
function, the return value will not change at all. (See below)
#Specify 3 as the argument of the f function.
>>> f(3)
2.0620146712373737
#net to f function.Specify W.
>>> f(net.W)
2.0620146712373737
#Define a numpy array and assign it to a. Compare when a is passed to the f function and when 3 is passed.
>>> a = np.array([[0.2, 0.1, -0.3],
[0.12, -0.17, 0.088]])
>>> f(a) == f(3)
True
As another point, I will present the numerical_gradient
function. (following)
def numerical_gradient(f, x):
h = 1e-4 # 0.0001
grad = np.zeros_like(x)
it = np.nditer(x, flags=['multi_index'], op_flags=['readwrite'])
while not it.finished:
idx = it.multi_index
tmp_val = x[idx]
x[idx] = tmp_val + h
fxh1 = f(x) # f(x+h)
x[idx] = tmp_val - h
fxh2 = f(x) # f(x-h)
grad[idx] = (fxh1 - fxh2) / (2*h)
x[idx] = tmp_val #Restore the value
it.iternext()
return grad
This function returns the grad
defined in the function.
If you follow from the bottom of the code how this grad
is derived,
You can find the code grad [idx] = (fxh1 --fxh2) / (2 * h)
.
So what are fxh1
and fxh2
?
You can find the code fxh1 = f (x)
fxh2 = f (x)
.
From point 2, you can think that the return value grad
of the numerical_gradient
function is due to the value of f (x)
.
From point 1, the f
function returns a constant value regardless of the value of the argument.
From points 1 and 2, no matter what value you assign to the second argument x
of the numerical_gradient
function
I thought it was strange that the return value of the numerical_gradient
function would change.
First, let's take a closer look at the numerical_gradient
function.
And let's take a closer look at the function f
.
numerical_gradient
functionA little tweak to the numerical_gradient
code.
Specifically, under fxh1 = f (x)
fxh2 = f (x)
respectively
Enter print (fxh1)
print (fxh2)
. (following)
def numerical_gradient(f, x):
h = 1e-4
grad = np.zeros_like(x)
it = np.nditer(x, flags=['multi_index'], op_flags=['readwrite'])
while not it.finished:
idx = it.multi_index
print('idx:', idx)
tmp_val = x[idx]
x[idx] = tmp_val + h
fxh1 = f(x) # f(x+h)
print('fxh1:', fxh1) # print(fxh1)Enter
x[idx] = tmp_val - h
fxh2 = f(x) # f(x-h)
print('fxh2:', fxh2) # print(fxh2)Enter
grad[idx] = (fxh1 - fxh2) / (2*h)
x[idx] = tmp_val
it.iternext()
return grad
Now let's move the code by changing the second argument.
Substitute net.W
as the second argument
>>> dW = numerical_gradient(f, net.W)
fxh1: 2.062020953321506
fxh2: 2.0620083894906935
fxh1: 2.062060757760379
fxh2: 2.061968585355599
fxh1: 2.061962303319411
fxh2: 2.062067039554999
fxh1: 2.062024094490122
fxh2: 2.062005248743893
fxh1: 2.062083801262337
fxh2: 2.0619455426551796
fxh1: 2.061936119510309
fxh2: 2.06209322386368
Substitute your own numpy array ʻa` for the second argument
>>> a = np.array([[0.2, 0.1, -0.3],
[0.12, -0.17, 0.088]])
>>> dW = numerical_gradient(f, a)
fxh1: 2.0620146712373737
fxh2: 2.0620146712373737
fxh1: 2.0620146712373737
fxh2: 2.0620146712373737
fxh1: 2.0620146712373737
fxh2: 2.0620146712373737
fxh1: 2.0620146712373737
fxh2: 2.0620146712373737
fxh1: 2.0620146712373737
fxh2: 2.0620146712373737
fxh1: 2.0620146712373737
fxh2: 2.0620146712373737
If you substitute net.W
for the second argument, the values of fxh1
and fxh2
are slightly different.
On the other hand, when you substitute your own numpy array ʻa,
fxh1 and
fxh2have the same value. Why? From now on, I will explain by considering the case where
net.W` is put in the second argument.
Let's take a closer look at the numerical_gradient
function.
There is the following code in the middle.
This code changes the index number of ʻidx, retrieves the
xof that index number, and A small
h is added to the extracted
xand assigned to the
f` function.
it = np.nditer(x, flags=['multi_index'], op_flags=['readwrite'])
while not it.finished:
idx = it.multi_index #Change the index number of idx
tmp_val = x[idx] #Take out the x of that index number
x[idx] = tmp_val + h #Add a small h to the extracted x
fxh1 = f(x) # f(x+h) #Assigned to the f function.
Did the return value of the f
function change due to the addition of a small h
to x
?
However, the fact that the return value of the f
function does not change due to changes in the arguments is shown in point 1 of" Why did you wonder? "
Actually, there is a part that changed by adding a small h
to x
.
The x
here is the net.W
assigned to the second argument of the numerical_gradient
function.
After adding a small h
to net.W
, it is passed to the argument of the f
function.
The following part of the numerical_gradient
function shown earlier.
x[idx] = tmp_val + h #Add a small h to the extracted x
fxh1 = f(x) #Assigned to the f function.
The important thing here is the order in which the f
function is called after the net.W
has changed.
How does the change in net.W
affect the f
function?
f
function in a little more detailLet's see how the change in net.W
affects the f
function.
The f
function is shown below.
def f(W):
return net.loss(x, t)
The loss
function that appears in the f
function is defined in the simpleNet
class defined on p.110 of this manual.
The simpleNet
class is shown below.
import sys, os
sys.path.append(os.pardir)
import numpy as np
from common.functions import softmax, cross_entropy_error
from common.gradient import numerical_gradient
class simpleNet:
def __init__(self):
self.W = np.random.randn(2,3)
def predict(self, x):
return np.dot(x, self.W)
def loss(self, x, t):
z = self.predict(x)
y = softmax(z)
loss = cross_entropy_error(y, t)
return loss
You will see the loss
function at the bottom of simpleNet
.
Inside the loss
function is the predict
function.
The predict
function is defined just above the loss
function.
If you take a closer look at the predict
function, you will see the weight parameter W
.
At the end of "Learn more about the numerical_gradient
function ", what effect does the change in net.W
have on the f
function? The answer is here.
By changing net.W
, the weight parameter W
of the predict
function called by the loss
function in the f
function has changed.
Then, of course, the return value of the loss
function will change.
The explanation is finally over.
Let's get back to the numerical_gradient
function. The numerical_gradient
function is shown below again.
def numerical_gradient(f, x):
h = 1e-4
grad = np.zeros_like(x)
it = np.nditer(x, flags=['multi_index'], op_flags=['readwrite'])
while not it.finished:
idx = it.multi_index
print('idx:', idx)
tmp_val = x[idx]
x[idx] = tmp_val + h
fxh1 = f(x) # f(x+h)
x[idx] = tmp_val - h
fxh2 = f(x) # f(x-h)
print('fxh2:', fxh2)
grad[idx] = (fxh1 - fxh2) / (2*h)
x[idx] = tmp_val
it.iternext()
return grad
As mentioned above, the return value of the loss
function in the f
function changes due to the change of net.W
.
In this code, adding a small h
to x
( net.W
) changed the function f
, and the value of fxh1
changed.
The same applies to the subsequent fxh2
.
Then passed to subsequent code, the numerical_gradient
function prints the return value.
This solved the first question I asked.
The important point is
● Suppress that the second argument of the numerical_gradient
function, x
, is net.W
.
● The return value of the f
function changed as the net.W
changed.
Let's continue reading "Deep Learning from scratch"!
Recommended Posts