This is my first post. I will summarize the points I was interested in in "Pattern Recognition and Machine Learning" (PRML) that I am currently reading. (Chapter 2 2.1 (p66 ~))
Let's start with the definition. When the random variable $ X $ follows a Bernoulli distribution with mean $ u $
P(x=1|u)=u,P(x=0|u)=1-u
Meet. Put the two together
P(x|u)=u^x (1-u)^{1-x}
You can also write.
A simple example is a coin with a $ u $ probability of appearing ($ x = 1 $). The following topics will also use coins as an example.
How to estimate the average $ u $ from a given sample. With maximum likelihood estimation $ N $ samples
x_1,x_2...x_n
Given, the likelihood function $ L $ defined below
L(u) = \prod_{i=0}^n u^{x_i}(1-u)^{1-x_i}
Let $ u_ {ML} $ be the maximum estimator for the true mean $ u $.
Let's find $ u $ that actually maximizes the likelihood function $ L $. First, to simplify the equation, we take the logarithm of the likelihood function $ L $.
log(L(u)) = \sum_{i=0}^N x_i log(u) + (1-x_i)log(1-u)
If $ u $ that maximizes $ log (L (u)) $ is $ u_ {ML} $
u_{ML} = \frac{1}{N} \sum_{i=0}^N x_i
This is when $ x = 1 $ is $ m $ in $ N $ trials.
u_{ML} = m
It means that
Let's try the maximum likelihood estimation method using the coin example. Now suppose you want to know the probability that a coin will appear on the table. For the time being, when I threw it about 10 times, the following results were obtained.
Table ・ ・ ・ 3 times
Behind ... 7 times
Follow the above method to find $ u_ {ML} $ that maximizes the likelihood function.
u_{ML} = \frac{1}{N} \sum_{i=0}^N x_i \\
= \frac{1}{10} \sum_{i=0}^{10} x_i \\
= \frac{3}{7}
Therefore, it was possible to estimate that "the probability that this coin will appear is $ \ frac {3} {7} $".
In the previous section, we found that the output of the maximum likelihood estimation method in the Bernoulli distribution depends on the number of times an event occurred in the trial. The drawback of the maximum likelihood estimation method is that when a coin is tossed three times and all the coins appear, it is estimated that "the probability that this coin will appear is 1". In other words, a small number of trials will cause overfitting.
Recommended Posts