Previous article I managed to understand the loss function, but that alone still defines the function and does not converge the value (loss) properly. I did not ask for the answer.
So, this time I will talk about how to find the answer.
I'll omit the explanation of the loss function, but in the end we have to converge the values. In this graph, find the "m" that minimizes the loss.
You don't get the answer all at once, but adjust it little by little to check the loss and make adjustments to make it smaller.
This "adjusting little by little" part is the miso, but I will use the wisdom that great people have thought about in various ways.
--Batch gradient descent method --Stochastic gradient descent --Mini batch gradient descent method
A typical method is like this, but there are various algorithms depending on how to implement this. I won't go into details, but the key points are the accuracy of the values and the speed of convergence.
TensorFlow also prepares so many.
It's hard to understand everything, so I wonder if it's okay if you only know the "Adam Optimizer" and "Gradient Descent Optimizer" used in the tutorial. "Adam Optimizer" is the Adam algorithm, and "Gradient Descent Optimizer" is the rapid descent method.
The usage is the same for both, specify the learning rate as the initial value, and pass the loss function to the "minimize ()" function to minimize the value. Each code looks like this:
train_step = tensorflow.train.AdamOptimizer(1e-4).minimize(cross_entropy)
train_step = tensorflow.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
It is easy to replace the algorithm used, but it is a trial and error to decide what the learning rate should be. .. .. ^^;
So, when you do this, it will automatically adjust the weights parameter so that the value of the loss function is smaller. This is "error backpropagation".
For difficult stories about gradient descent algorithms, read the article How to do this.
Recommended Posts