This article is an easy-to-understand output of ** Deep Learning from scratch Chapter 7 Learning Techniques **. I was able to understand it myself in the humanities, so I hope you can read it comfortably. Also, I would be more than happy if you could refer to it when studying this book.
SGD SGD is a method of updating parameters by multiplying the learning coefficient and subtracting from the current parameters after finding the gradient as before. This method is simple and easy to implement, but since the direction indicated by the gradient is not the direction to the exact minimum value, it makes a jagged and inefficient search to the point of the parameter that is the minimum value of the loss function. Is a weak point.
Momentum A method with the added concept of speed. The parameter is updated by finding the speed at which the slope rolls from the gradient to the point with the minimum value of the loss function at the current point and adding it to the parameter. It does a zigzag search like SGD, but the inefficiency is reduced because the zigzag is reduced and the search becomes rounded zigzag compared to SGD.
AdaGrad At first, the learning coefficient is increased and updated, and then the learning coefficient is decreased and updated. ** Attenuation of learning coefficient ** is a technique used. Initially, the parameters are greatly updated, and then the parameter updates are gradually reduced. By using this method, the zigzag search can be further reduced, and an efficient search can be performed.
Adam A new method proposed in 2015, which is a combination of Momentum and AdaGrad. Since it is complicated, I will not explain it here, but it is possible to search very efficiently.
The main methods currently used are simple SGD and very efficient but complex Adam.
Recommended Posts