Previous article I managed to understand the loss function, but that alone still defines the function and does not converge the value (loss) properly. I did not ask for the answer.

So, this time I will talk about how to find the answer.

How to converge the value

I'll omit the explanation of the loss function, but in the end we have to converge the values. In this graph, find the "m" that minimizes the loss.

You don't get the answer all at once, but adjust it little by little to check the loss and make adjustments to make it smaller.

This "adjusting little by little" part is the miso, but I will use the wisdom that great people have thought about in various ways.

--Batch gradient descent method --Stochastic gradient descent --Mini batch gradient descent method

A typical method is like this, but there are various algorithms depending on how to implement this. I won't go into details, but the key points are the accuracy of the values and the speed of convergence.

Implementation of TensorFlow

TensorFlow also prepares so many.

tensorflow.train.Optimizer
tensorflow.train.GradientDescentOptimizer
tensorflow.train.AdadeltaOptimizer
tensorflow.train.AdagradOptimizer
tensorflow.train.AdagradDAOptimizer
tensorflow.train.MomentumOptimizer
tensorflow.train.AdamOptimizer
tensorflow.train.FtrlOptimizer
tensorflow.train.ProximalGradientDescentOptimizer
tensorflow.train.ProximalAdagradOptimizer
tensorflow.train.RMSPropOptimizer

It's hard to understand everything, so I wonder if it's okay if you only know the "Adam Optimizer" and "Gradient Descent Optimizer" used in the tutorial. "Adam Optimizer" is the Adam algorithm, and "Gradient Descent Optimizer" is the rapid descent method.

The usage is the same for both, specify the learning rate as the initial value, and pass the loss function to the "minimize ()" function to minimize the value. Each code looks like this:

train_step = tensorflow.train.AdamOptimizer(1e-4).minimize(cross_entropy)

train_step = tensorflow.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

Actually, there are more arguments, but see the TensorFlow manual for that!

It is easy to replace the algorithm used, but it is a trial and error to decide what the learning rate should be. .. .. ^^;

So, when you do this, it will automatically adjust the weights parameter so that the value of the loss function is smaller. This is "error backpropagation".

If you want to know more

For difficult stories about gradient descent algorithms, read the article How to do this.

[PYTHON] Roughly think about the gradient descent method

How to converge the value

Implementation of TensorFlow

If you want to know more