Note what I learned from "Deep Learning from scratch"
Indicator of poor neural network performance In neural network learning, parameters are updated using the gradient as a clue.
$ y_k $ is the number of neural network outputs, $ t_k $ is the teacher data, and $ k $ is the number of dimensions of the data. The output of the neural network can be interpreted as a probability. $ (0 \ leq y_k \ leq1) $ Also, $ t_k $ is a one-hot expression.
E = \frac{1}{2}\sum_{k}(y_k-t_k)^2
$ \ Log $ has a base of $ e $. $ y_k $ and $ t_k $ are the same as the sum of squares error. The closer the output $ y_k $ is to 1, the smaller the error entropy.
E = -\sum_{k}t_k\log y_k
A certain number of chunks selected from training data = mini-batch Learning is performed for each mini-batch.
Find the minimum value of the loss function using the gradient. Decrease the value of the function by repeating the process of traveling a certain distance in the gradient direction from the current location. To be precise, it is called the gradient descent method.
x = x - \eta \frac{\partial f}{\partial x_0}
$ \ eta $ is the amount of updates and is called the learning rate. Such parameters adjusted by human hands are called hyperparameters.
Gradient descent for randomly selected data as a mini-batch.
There are adaptive weights and biases. Applying this weight and bias to training data is "learning"
Part of the data is randomly extracted from the training data. Decrease the value of the loss function of the mini-batch.
To find the loss function of the mini-batch, find the gradient of each weight parameter.
Update the weight parameter in the gradient direction.
Repeat steps 1 to 3.
Next time, we will implement a two-layer neural network.
Recommended Posts