Summary of explanation

The purpose here is to interpret what I couldn't understand just by reading a book while studying Deep Learning, and to remember it smoothly when I look back at it later. I will explain the contents of the code as carefully as possible, so I hope it will be helpful.

4.2 Loss function

In order to improve the performance in neural network learning, it is necessary to approach the optimum weight parameter. Use this loss function as a search clue. First of all, what is output from the loss function 　　　　　　0.6094374124342252 　　　　　　0.4750000000000001　 It is a numerical value such as. This value is small when the performance is good, and large when the performance is bad. In this example, the value below is smaller, so it can be said that the performance is higher. The value of the output of the loss function is used as a clue to refer to the direction and magnitude of updating the weight parameter.

Types of loss functions

There are various loss functions, but here we will explain the sum of squares error and the cross entropy error.

1. Mean Squared Error

The sum of squares error is calculated by the following formula.

E = \frac{1}{N}\sum_{i=1}^{N}(y_i -t_i)^2

Can be represented by. To explain the formula, the difference between the output value data (y _{i ) and the correct value data (t _{i ) is squared and averaged by N.
The square is to make the error a positive value. If you want a positive value, take the absolute value}}

E = \frac{1}{N}\sum_{i=1}^{N}|y_i -t_i|

Is it okay to do this? I thought, but apparently it's easier to square when calculating the derivative. I see! There are cases where the derivative of the absolute value is divided ... 　 Furthermore, when differentiated, 2 comes out in front, so add 1/2.

E = \frac{1}{2}*\frac{1}{N}\sum_{i=1}^{N}(y_i -t_i)^2

It seems that it may be.

Example using sum of squares error

This time, let's define the function with N in the above formula as 1 and see the result. y is the output result of the Softmax function.

import numpy as np

#Correct answer data(one-hot-label)
t = [0,0,1,0,0]

#Define a function of sum of squares error
def mean_squared_error(y,t):
    return 0.5 * np.sum((y-t)**2)

#Pattern 1(Close to correct data)
y1 = [0.01,0.02,0.9,0.05,0.02]
#Pattern 2(Far from correct data)
y2 = [0.5,0.1,0.2,0.2,0.1]

out1 = mean_squared_error(np.array(y1),np.array(t))
out2 = mean_squared_error(np.array(y2),np.array(t))

Each result is print(out1) >>> 0.006699999999999998 print(out2) >>> 0.4750000000000001 The error was small when it was close to the correct answer data, and large when it was far from it. Therefore, in this case, the sum of squares error indicates that the output result of pattern 1 is more suitable for the teacher data.

2. Cross Entropy Error

The cross entropy error is calculated by the following formula.

E = -\sum_{k}t_klog_e{y_k}

Can be represented by.

The difference from the sum of squares error is that the output data and the correct answer data are multiplied. To explain what the benefits of this are The correct answer data is a one-hot expression, and only the correct answer label is 1, and the others are 0. So when applied to the above formula, the value of E is

** Correct label only -log _{y _{k **
Otherwise 0}}

Do you know that As a result, the cross entropy error is determined by the output result of the correct label. If the output label corresponding to the correct label is ** small **, the value of E will be large, indicating that the error is large.

Example using cross entropy error

We will define the function in the same way as for the sum of squares error, Before that, I will explain about the delta defined in the code.

As you can see from the graph of y = logx, when x-> 0, lim y becomes ** negative ∞ **. If the output label corresponding to the correct label is ** 0 **, the cross entropy error cannot be expressed numerically, and the calculation cannot proceed any further.

In order to avoid this, a minute value delta (10 ^{-7 in the code) is inserted to prevent the log contents from becoming 0.}

import numpy as np

#Correct answer data(one-hot-label)
t = [0,0,1,0,0]

#Define a function of cross entropy error
def cross_entropy_error(y,t):
    #Define delta(Be careful not to open the space!)
    delta = 1e-7
    return -np.sum(t * np.log(y + delta))

#Pattern 1(Close to correct data)
y1 = [0.01,0.02,0.9,0.05,0.02]
#Pattern 2(Far from correct data)
y2 = [0.5,0.1,0.2,0.2,0.1]

out1 = cross_entropy_error(np.array(y1),np.array(t))
out2 = cross_entropy_error(np.array(y2),np.array(t))

Each result is print(out1) >>> 0.1053604045467214 print(out2) >>> 1.6094374124342252 Did you know that the closer to the correct answer data, the smaller the value, as in the case of the sum of squares error?

Summary

--The loss function is an important index for updating parameters (weights and biases). --The smaller the output value of the loss function, the closer to the optimum parameter.

Reference book

[Deep Learning from scratch-Theory and implementation of deep learning learned with Python (Japanese)](https://www.amazon.co.jp/%E3%82%BC%E3%83%AD%E3%81] % 8B% E3% 82% 89% E4% BD% 9C% E3% 82% 8BDeep-Learning-% E2% 80% 95Python% E3% 81% A7% E5% AD% A6% E3% 81% B6% E3% 83% 87% E3% 82% A3% E3% 83% BC% E3% 83% 97% E3% 83% A9% E3% 83% BC% E3% 83% 8B% E3% 83% B3% E3% 82% B0% E3% 81% AE% E7% 90% 86% E8% AB% 96% E3% 81% A8% E5% AE% 9F% E8% A3% 85-% E6% 96% 8E% E8% 97% A4 -% E5% BA% B7% E6% AF% 85 / dp / 4873117585 / ref = sr_1_1? __ mk_ja_JP =% E3% 82% AB% E3% 82% BF% E3% 82% AB% E3% 83% 8A & crid = W6DVSLVW0BUS & dchild = 1 & keywords =% E3% 82% BC% E3% 83% AD% E3% 81% 8B% E3% 82% 89% E4% BD% 9C% E3% 82% 8Bdeep + learning & qid = 1597943190 & sprefix =% E3% 82% BC % E3% 83% AD% E3% 81% 8B% E3% 82% 89% 2Caps% 2C285 & sr = 8-1)]

[PYTHON] Summary Note on Deep Learning -4.2 Loss Function-