This series is described as my personal learning and its memorandum, but I am posting it with the hope that I can share what I have learned with you. We mainly organize terms that appear while studying machine learning and deep learning. This time, we will summarize the outline of the probability model and maximum likelihood estimation that appear in the machine learning model.
A probabilistic model is a model that assumes that the variable x is generated from a probability distribution `` `P (x | θ) ``` with the parameter θ.
Probabilistic model
x ~ P(x|\theta)
If x is a continuous variable, it has a normal distribution.
normal distribution
N(x|\mu, \sigma^2) = \frac{1}{\sqrt{2\pi \sigma^2}} exp \begin{bmatrix} - \frac{(x-\mu)^2}{2\sigma^2} \end{bmatrix}
Discrete variables, especially those that take 0 or 1 such as love toss, are called Bernoulli distributions.
Bernoulli distribution
B(x|p) = p^x(1-p)^{1-x}
Given some mutually independent N data X = (x0, x1, ...), if the product of the values of the probability functions of each data is a function of θ, this is the theta likelihood It becomes more like) and is called likelihood (Likelihood).
Likelihood
L(\theta) = \prod_{n}P(x_n|\theta)
Likelihood is the most important quantity in the probability model, and finding the parameter θ that maximizes the likelihood is called maximum likelihood estimation (MLE). Normally, it is treated in the form of log-likelihood as shown below because of the ease of calculation.
Log likelihood
lnL(\theta) = \sum_nlnP(x_n|\theta)
It is obtained by partially differentiating the log-likelihood with respect to μ and solving the equation whose value becomes 0 (as a result, the maximum likelihood estimation of the expected value parameter μ is the mean value of all x).
Maximum likelihood estimation of the expected value parameter μ of the normal distribution
lnL(\theta) = - \frac{N}{2}ln2\pi\sigma^2 - \frac{1}{2\sigma^2}\sum_n(x_n-\mu)^2\\
\frac{\delta}{\delta_p}lnL(\theta) = - \frac{1}{\sigma^2}\sum_n(x_n - \mu) = 0 \\
\mu = \frac{1}{N}\sum_nx_n = \bar{x}
Similarly, for the Bernoulli distribution, the maximum likelihood estimation of p is solved as follows. Here, if the number of x = 1 is M
Maximum likelihood estimation of Bernoulli distribution
\sum_nx_n = M \\
lnL(\theta) = \sum_nx_nlnp + (1 - x_n)ln (1 - p) \\
=Nlnp + (N - M)ln(1 - p) \\
\frac{\delta}{\delta_p}lnL(\theta) = - \frac{M}{p} + \frac{N -M}{1 -p} = 0 \\
p = \frac{M}{N}
And p is the ratio of the number of times x = 1.
In this series, I will try to suppress only the necessary parts with such a voluminous feeling. Next time, I will summarize the stochastic gradient descent method, so please take a look there as well. Thank you for browsing to the end.
Recommended Posts