[PYTHON] Roughly think about the gradient descent method

Previous article I managed to understand the loss function, but that alone still defines the function and does not converge the value (loss) properly. I did not ask for the answer.

So, this time I will talk about how to find the answer.

How to converge the value

I'll omit the explanation of the loss function, but in the end we have to converge the values. 801933de-d8ba-729f-9b2c-6bbe3bd64212.jpeg In this graph, find the "m" that minimizes the loss.

You don't get the answer all at once, but adjust it little by little to check the loss and make adjustments to make it smaller. 9170a054-e922-46c3-88bf-0365be4df036.png

This "adjusting little by little" part is the miso, but I will use the wisdom that great people have thought about in various ways.

--Batch gradient descent method --Stochastic gradient descent --Mini batch gradient descent method

A typical method is like this, but there are various algorithms depending on how to implement this. I won't go into details, but the key points are the accuracy of the values and the speed of convergence.

Implementation of TensorFlow

TensorFlow also prepares so many.

It's hard to understand everything, so I wonder if it's okay if you only know the "Adam Optimizer" and "Gradient Descent Optimizer" used in the tutorial. "Adam Optimizer" is the Adam algorithm, and "Gradient Descent Optimizer" is the rapid descent method.

The usage is the same for both, specify the learning rate as the initial value, and pass the loss function to the "minimize ()" function to minimize the value. Each code looks like this:

train_step = tensorflow.train.AdamOptimizer(1e-4).minimize(cross_entropy)
train_step = tensorflow.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

It is easy to replace the algorithm used, but it is a trial and error to decide what the learning rate should be. .. .. ^^;

So, when you do this, it will automatically adjust the weights parameter so that the value of the loss function is smaller. This is "error backpropagation".

If you want to know more

For difficult stories about gradient descent algorithms, read the article How to do this.

Recommended Posts

Roughly think about the gradient descent method
Roughly think about the loss function
[Python] Seriously think about the M-1 winning method.
Output the result of gradient descent method as matplotlib animation
Think about the minimum change problem
Machine learning algorithm (gradient descent method)
Saddle point search using the gradient method
About the accuracy of Archimedean circle calculation method
About the test
Gradient method implementation 1
Think about the selective interface on the command line
About the queue
Think about how to program Python on the iPad
Sort in Python. Next, let's think about the algorithm.
Think about the next generation of Rack and WSGI
About the Unfold function
About the service command
About the confusion matrix
About the Visitor pattern
Think about the analysis environment (Part 1: Overview) * As of January 2017