Introduction

The ** exponential distribution ** always comes up when studying statistics, but since the formula for the probability distribution in the example didn't come to my mind, I thought about understanding it carefully from the derivation of the probability distribution. I also draw in Python to grab the image.

reference

In understanding the exponential distribution and drawing the distribution, I referred to the following.

-[University Mathematics] Exponential distribution (concrete examples and their meanings, relationship with Poisson distribution) [Probability statistics] -Introduction to Statistics (Basic Statistics I) Department of Statistics, Faculty of Liberal Arts, University of Tokyo

Maximum Likelihood for the Exponential Distribution, Clearly Explained! V2.0

Understanding exponential distribution

What is an exponential distribution

\begin{equation}
f(x)=
    \left\{
    \begin{aligned}
          &\lambda \mathrm{e}^{-\lambda x}　&(x\geq0) \\
          &0 &(x<0)\\
    \end{aligned}
    \right.
\end{equation}

The exponential distribution is a probability distribution that represents the probability that the interval between events that occur an average of $ \ lambda $ times per unit time is $ x $ unit time **. The probability density function is given as above.

The exponential distribution is used in the following examples.

--Intervals when disasters occur --Interval at which accidental system failures with constant failure times occur ――The interval between the arrival of one customer and the arrival of the next customer in the store

It also has the property that the expected value is $ \ frac {1} {\ lambda} $ and the variance is $ \ frac {1} {\ lambda ^ 2} $.

Exponential distribution

Now let's actually draw the distribution. Consider the following three examples and draw the "probability distribution of the time interval until the next customer visits the store".

--A store with an average of 5 people per hour ($ \ lambda = 5 ) --A store with an average of 10 people visiting per hour ( \ lambda = 10 ) --A store with an average of 15 people per hour ( \ lambda = 15 $)

import numpy as np
import matplotlib.pyplot as plt

def exp_dist(lambda_, x):
    
    return lambda_ * np.exp(- lambda_*x) 

x =  np.arange(0, 1, 0.01)
y1= [exp_dist(5,i) for i in x]
y2= [exp_dist(10,i) for i in x]
y3= [exp_dist(15,i) for i in x]

plt.plot(x, y1, color="red"
                ,alpha=0.5, label="exp_dist λ= %d" % 5)

plt.plot(x, y2, color="green"
                ,alpha=0.5, label="exp_dist λ= %d" % 10)

plt.plot(x, y3, color="blue"
                ,alpha=0.5, label="exp_dist λ= %d" % 15)

plt.legend()
plt.show()

ダウンロード (3).png

** The smaller the value of $ \ lambda $, the slower the decrease, but the point is that it always decreases monotonously regardless of the value of $ \ lambda $. ** In terms of intervals, a store with an average of 15 customers will come sooner than a store with an average of 5 customers in an hour **. You can see that.

What's even more important is that the closer the value of ** $ \ lambda $ is to $ x = 0 $, the higher the probability density **. Isn't it strange that the next customer is most likely to come soon? I think some people may feel uncomfortable, but this is due to the ** memorability ** property of the exponential distribution. ** It does not mean that something has happened once and is likely to happen again, but it is considered to be a completely random event, so it is more likely that it will occur sooner than if it does not occur for a long time. It means that it will be. ** (I don't remember if the previous event happened or not in the sense that it's completely forgotten)

Cumulative distribution function of exponential distribution

What we want to know more about our daily feelings is the probability that the next customer will come in exactly 10 minutes, rather than the probability that they will come within ** 10 minutes **. Therefore, consider the probability that the event occurrence interval is within $ x $ unit time. You need to add the probabilities, that is, integrate them.

{\begin{eqnarray}

F(x) &=& \int_0^x f(x) dx \\
     &=& \int_0^x \lambda \mathrm{e}^{-\lambda x} dx　\\
     &=& \lambda\int_0^x \mathrm{e}^{-\lambda x} dx \\
     &=& \lambda\left[\frac{1}{-\lambda}\mathrm{e}^{-\lambda x}\right]^x_0 \\
     &=& -\mathrm{e}^{-\lambda x} - (-1) \\
     &=& 1 - \mathrm{e}^{-\lambda x} \\


\end{eqnarray}}

This cumulative distribution function is also drawn by considering the following three examples.

def cum_exp_dist(lambda_, x):
    
    return 1 - np.exp(-lambda_ * x)

x =  np.arange(0, 1, 0.01)
y1= [cum_exp_dist(5,i) for i in x]
y2= [cum_exp_dist(10,i) for i in x]
y3= [cum_exp_dist(15,i) for i in x]

plt.plot(x, y1, color="red"
                ,alpha=0.5, label="cum_exp_dist λ= %d" % 5)

plt.plot(x, y2, color="green"
                ,alpha=0.5, label="cum_exp_dist λ= %d" % 10)

plt.plot(x, y3, color="blue"
                ,alpha=0.5, label="cum_exp_dist λ= %d" % 15)

plt.legend()
plt.show()

ダウンロード (4).png

You can see that the graph is monotonically increasing with 1 as the maximum value.

Let's calculate the probability by giving one concrete example. ex) Probability that the next customer will come to the store within 5 minutes at a store where an average of 10 people visit in an hour

{\begin{eqnarray}

F(x)  &=& 1 - \mathrm{e}^{-Ten·\frac{1}{12}} \\
      &=& 0.565

\end{eqnarray}}

It corresponds to the following points on the graph.

ダウンロード (5).png

Derivation of exponential distribution

Now let's consider the derivation of the exponential distribution.

図3.png

It is considered that the situation where the event does not occur continues until the time $ x $, and the event occurs for the first time in the interval between $ x $ and $ x + Δx . ( Δx $ is a minute interval) At that time, the probability that an event will occur for the first time in the interval between $ x $ and $ x + Δx $ can be expressed by the following equation.


f(x)Δx = (1 - F(x))・\lambdaΔx

From the definition of the probability density function in the first place, it can be seen that $ f (x) Δx $ represents the probability that an event will occur between $ Δx $. On the right side, the probability that an event will not occur by $ x $ ($ (1-F (x)) $) and the probability that an event will occur between $ x $ and $ x + Δx $ ($ \ lambdaΔx $) Are multiplied.

Due to memorylessness, the probability of an event occurring by $ x $ and the probability of an event occurring between $ x $ and $ x + Δx $ are independent and can be multiplied as is.

Then, I will write below where $ \ lambdaΔx $ comes from. Since the minute interval $ Δx $ divides one unit time into $ n $ equal parts ($ n $ is sufficiently large), the following holds.


Δx ・ n=1

Since we are considering events that occur an average of $ \ lambda $ times per unit time this time, the probability that an event will occur in each minute interval using the above formula is as follows.


p = \frac{\lambda}{n} = \lambdaΔx

Now you know what $ \ lambdaΔx $ means. Let's expand the formula given at the beginning.

{\begin{eqnarray}


f(x)Δx &=& (1 - F(x))・\lambdaΔx \\
   f(x)&=& \lambda - \lambda F(x) \\
   f'(x)&=& -\lambda f(x)

\end{eqnarray}}

In the expansion of the 2nd to 3rd lines, both sides are differentiated by $ x $.

$ f'(x) =-\ lambda f (x) $ is in the form of a differential equation, but it is easily an exponential function considering that it is a function whose shape does not change even if it is differentiated. I understand.


f(x) =C ・\mathrm{e}^{-\lambda x}

Since $ C $ is a constant, it can be anything, but if you integrate it, it must be a cumulative distribution function. Therefore, there is inevitably a constraint that $ \ int_0 ^ ∞ f (x) = 1 $. Use that constraint to determine the value of the constant $ C $.

{\begin{eqnarray}

\int_0^∞f(x) &=& \int_0^∞ C ・\mathrm{e}^{-\lambda x} \\
     &=& C\int_0^∞ \mathrm{e}^{-\lambda x}　\\
     &=& C\left[\frac{1}{-\lambda}\mathrm{e}^{-\lambda x}\right]^∞_0 \\
　　　&=& C\left[\frac{1}{-\lambda}\mathrm{e}^{-\lambda x}\right]^∞_0 \\
　　　&=& -C(\frac{1}{-\lambda})\\
　　　&=& \frac{C}{\lambda}\\
        \\
1 &=& \frac{C}{\lambda}

\end{eqnarray}}

Due to the constraint of $ \ int_0 ^ ∞f (x) = 1 $

{\begin{eqnarray}

1 &=& \frac{C}{\lambda}　\\
C &=& \lambda　\\

\end{eqnarray}}

From the above, we were able to derive $ f (x) = \ lambda \ mathrm {e} ^ {-\ lambda x} $.

Maximum likelihood estimation of exponential distribution parameters

Next, let us consider the maximum likelihood estimation of the parameters of the exponential distribution. ** Maximum likelihood estimation is the likelihood function $ L (\ theta; x) = f (x; \ theta) when the probability density function according to a certain parameter $ \ theta $ is $ f (x; \ theta) $. The estimator $ \ theta = \ hat \ theta $ that maximizes $ is called the maximum likelihood estimator. ** **

Suppose that $ x_1, x_2, \ cdots, x_n $ is output when a random number is generated that follows the exponential distribution of $ \ lambda = \ theta $ independently. Consider the maximum likelihood estimator of $ \ lambda = \ theta $ at this time.


L(\theta ; x) = \theta \mathrm{e}^{-\theta x}

This time, $ x_1, x_2, \ cdots, x_n $ and $ n $ are given, and since they are all independently generated, the likelihood function can be expressed as follows.


\begin{eqnarray*}L(\theta ;x_1,x_2,\cdots,x_n)=
L(\theta ;x_1)×L(\theta ;x_2)×\cdots ×L(\theta ;x_n)
\end{eqnarray*}

Applying the above to the exponential distribution gives the following.


\begin{eqnarray*}L(\mu,\sigma ;x_1,x_2,\cdots,x_n)=
\displaystyle\prod_{k=1}\theta\mathrm{e}^{-\theta x}

\end{eqnarray*}

You can find $ \ theta $ that maximizes the above likelihood function, but you can get the same result by converting it to a log-likelihood function and replacing it with a task that finds the maximum value. Convert to a degree function.


\begin{eqnarray*}
\log(L(\mu,\sigma ;x_1,x_2,\cdots,x_n))&=&
\log(n\theta) + \log( \mathrm{e}^{-\theta (x_1 + x_2 + \cdots, + x_n)})　\\
&=& n\log(\theta) -\theta (x_1 + x_2 + \cdots, + x_n) \\

\end{eqnarray*}

Put $ l (\ theta) = \ log (L (\ theta; x_1, x_2, \ cdots, x_n)) $ and differentiate $ l (\ theta) $ by $ \ theta $.


{\begin{eqnarray*}
\frac{\partial l(\theta)}{\partial\theta}&=&\frac{\partial}{\partial\theta}(n\log(\theta) -\theta (x_1 + x_2 + \cdots, + x_n)) \\
&=&Hmm·\frac{1}{\theta} - (x_1 + x_2 + \cdots, + x_n)

\end{eqnarray*}}

From here, set $ \ frac {\ partial l (\ theta)} {\ partial \ theta} = 0 $ and solve for $ \ theta $.


{\begin{eqnarray*}

Hmm·\frac{1}{\theta} - (x_1 + x_2 + \cdots, + x_n) &=& 0 \\
Hmm·\frac{1}{\theta} &=& (x_1 + x_2 + \cdots, + x_n)\\
\theta &=& \frac{n}{x_1 + x_2 + \cdots, + x_n} \\

\end{eqnarray*}}

We found that the maximum likelihood estimator of the exponential distribution parameter is $ \ frac {n} {x_1 + x_2 + \ cdots, + x_n} $.

Let's apply this maximum likelihood estimator to a concrete example. Suppose you want to measure the intervals at which customers visit your store five times.

--Time from opening to the first visit: 20 minutes ――Time from the first visit to the second visit: 15 minutes ――Time from the second person's visit to the third person's visit: 20 minutes ――Time from the third person's visit to the fourth person's visit: 15 minutes ――Time from the 4th visit to the 5th visit: 20 minutes

Assuming that the store visit interval follows an exponential distribution, the maximum likelihood estimator of the parameter $ \ lambda $ obtained from the above data is as follows.


{\begin{eqnarray*}

\frac{5}{\frac{1}{3} + \frac{1}{4} + \frac{1}{3} + \frac{1}{4} + \frac{1}{3}}
\fallingdotseq 3.33

\\

\end{eqnarray*}}

Maximum likelihood estimation from the given data shows that the customer visit interval is considered to follow the exponential distribution when the average number of visits per unit time (1 hour) is $ 3.33 $.

Next I got a rough idea of the exponential function. I would like to continue posting statistics-related articles in the future.

Carefully understand the exponential distribution and draw in Python