[PYTHON] Significance of machine learning and mini-batch learning
Introduction
This is a note about mini-batch learning in machine learning. It also serves as an intuitive explanation of machine learning.
What is machine learning?
Machine learning is an arithmetic unit that automatically extracts rules that are assumed to be inherent in a given data. That is, when the output $ \ boldsymbol {t} $ is already obtained for a certain input set $ \ boldsymbol {x} $, a new input set $ \ is extracted by extracting the rules that exist there. Expect $ \ boldsymbol {t} ^ {\ prime} $ corresponding to boldsymbol {x} ^ {\ prime} .
this**Rules**What will be(Called weight)Generally a matrixWThen, the above story is the output of machine learning\boldsymbol{y}(Ideally\boldsymbol{t}Should match)To$ \boldsymbol{y} = W \boldsymbol{x}\tag{1}$When\boldsymbol{y}When\boldsymbol{t}Loss function obtained from$ L = \frac{1}{2}||\boldsymbol{y}(\boldsymbol{x},W)-\boldsymbol{t}||^{2}\tag{2}$To可能な限り小さくするようなW$To求めるWhenいうこWhenに置き換えられます。
Big data and mini-batch learning
Now, suppose that big data $ (\ boldsymbol {x} \ _ {n}, \ boldsymbol {t} \ _ {n}) $ is obtained again ($ n = 1,2, \ cdots, N $). $ N $ is big enough). Extract the weight $ W $ from this and predict the appropriate output $ \ boldsymbol {t} ^ {\ prime} $ for the input $ \ boldsymbol {x} \ ^ {\ prime} $ whose output is unknown. Therefore, the sum of squares error for all $ \ boldsymbol {x} \ _ {n} $
$\frac{1}{N} \sum_{n=1}^{N}\|\|\boldsymbol{y}\_{n}(\boldsymbol{x}\_{n},W)-\boldsymbol{t}_{n} \|\|^{2} \tag{3}Like to minimizeW$Ask for. For example, there are methods such as gradient descent.
The scale of big data we are dealing with nowN(\gg 1)So even if it's a computerWTo seekn=1FromNIt is not a wise method because the amount of calculation is enormous if the differential calculation is done honestly. ThereforeNOfM (\ll N)Only one piece of data is randomly taken out and against this$ \frac{1}{M} \sum_{m=1}^{M}\|\|\boldsymbol{y}\_{m}(\boldsymbol{x}\_{m},W)-\boldsymbol{t}_{m} \|\|^{2} \tag{4}By minimizing(3)Overwhelmingly efficient with respect to the method ofWCan be determined. Obtained in this wayW$は、与えられた全てのデータFrom求まったわけではありませんが、もとのデータのもつ規則の良い近似となっていることが多いです。このような学習手法をミニバッチ学習と呼びます。
- Learning means determining $ W $. The concrete implementation is next.