Introduction

This series is described as my personal learning and its memorandum, but I am posting it with the hope that I can share what I have learned with you. We mainly organize terms that appear while studying machine learning and deep learning. This time, we will summarize the outline of the probability model and maximum likelihood estimation that appear in the machine learning model.

Probabilistic model

A probabilistic model is a model that assumes that the variable x is generated from a probability distribution `` `P (x | θ) ``` with the parameter θ.

`Probabilistic model`


x ~ P(x|\theta)

Example) Normal distribution

If x is a continuous variable, it has a normal distribution.

`normal distribution`


N(x|\mu, \sigma^2) = \frac{1}{\sqrt{2\pi \sigma^2}} exp \begin{bmatrix} - \frac{(x-\mu)^2}{2\sigma^2} \end{bmatrix}

Example) Bernoulli distribution

Discrete variables, especially those that take 0 or 1 such as love toss, are called Bernoulli distributions.

`Bernoulli distribution`


B(x|p) = p^x(1-p)^{1-x}

Likelihood

Given some mutually independent N data X = (x0, x1, ...), if the product of the values of the probability functions of each data is a function of θ, this is the theta likelihood It becomes more like) and is called likelihood (Likelihood).

`Likelihood`


L(\theta) = \prod_{n}P(x_n|\theta)

Likelihood is the most important quantity in the probability model, and finding the parameter θ that maximizes the likelihood is called maximum likelihood estimation (MLE). Normally, it is treated in the form of log-likelihood as shown below because of the ease of calculation.

`Log likelihood`


lnL(\theta) = \sum_nlnP(x_n|\theta)

Example) Maximum likelihood estimation of the expected value parameter μ of the normal distribution

It is obtained by partially differentiating the log-likelihood with respect to μ and solving the equation whose value becomes 0 (as a result, the maximum likelihood estimation of the expected value parameter μ is the mean value of all x).

`Maximum likelihood estimation of the expected value parameter μ of the normal distribution`


lnL(\theta) = - \frac{N}{2}ln2\pi\sigma^2 - \frac{1}{2\sigma^2}\sum_n(x_n-\mu)^2\\
\frac{\delta}{\delta_p}lnL(\theta) = - \frac{1}{\sigma^2}\sum_n(x_n - \mu) = 0 \\
\mu = \frac{1}{N}\sum_nx_n = \bar{x}

Example) Maximum likelihood estimation of p of Bernoulli distribution

Similarly, for the Bernoulli distribution, the maximum likelihood estimation of p is solved as follows. Here, if the number of x = 1 is M

`Maximum likelihood estimation of Bernoulli distribution`


\sum_nx_n = M \\
lnL(\theta) = \sum_nx_nlnp + (1 - x_n)ln (1 - p) \\
=Nlnp + (N - M)ln(1 - p) \\
\frac{\delta}{\delta_p}lnL(\theta) = - \frac{M}{p} + \frac{N -M}{1 -p} = 0 \\
p = \frac{M}{N}

And p is the ratio of the number of times x = 1.

in conclusion

In this series, I will try to suppress only the necessary parts with such a voluminous feeling. Next time, I will summarize the stochastic gradient descent method, so please take a look there as well. Thank you for browsing to the end.

[PYTHON] Machine Learning Super Introduction Probability Model and Maximum Likelihood Estimate

Introduction

Probabilistic model

Probabilistic model

Example) Normal distribution

normal distribution

Example) Bernoulli distribution

Bernoulli distribution

Likelihood

Likelihood

Log likelihood

Example) Maximum likelihood estimation of the expected value parameter μ of the normal distribution

Maximum likelihood estimation of the expected value parameter μ of the normal distribution

Example) Maximum likelihood estimation of p of Bernoulli distribution

Maximum likelihood estimation of Bernoulli distribution

in conclusion

`Probabilistic model`

`normal distribution`

`Bernoulli distribution`

`Likelihood`

`Log likelihood`

`Maximum likelihood estimation of the expected value parameter μ of the normal distribution`

`Maximum likelihood estimation of Bernoulli distribution`