This series is a brief explanation of "Basics of Modern Mathematical Statistics" by Tatsuya Kubokawa, and let's implement the contents in python. I used Google Colaboratory (hereinafter referred to as Colab) for implementation. If you have any suggestions, I would appreciate it if you could write them in the comment section. It may not be suitable for those who want to understand all the contents of the book properly because it is written with a stance that it would be nice if it could be output by touching only the part that I thought needed explanation. Please note that if the formula numbers and proposition / definition indexes are written according to the book, the numbers may be skipped in this article.
First, we strictly express the random variables that we use casually, and explain the probability distributions in the discrete and continuous types. It may be confusing to come up with similar words, but once you understand the content, you will not get lost. Next, the expected value is defined and the variance, standard deviation, etc. are explained. Probability generating functions, moment generating functions, and characteristic functions may be new to you, but they are important functions that will deepen your knowledge of statistics. I think it's okay to know only the idea of the last change of variables and do it each time you need it. I think that the first and second chapters are the preparations for the third and subsequent chapters, and even if they are not perfect at the moment, I could understand them while reading.
$$ Random variables do not handle all the events that you are thinking of, but omit the parts that do not matter and make them easier to handle. For example, suppose you randomly select 100 people and ask if you like guppy. All events $ \ Omega $ consist of $ 2 ^ {100} $ elements, which is an individual distinction. But what I want to know now is how many out of 100 people like guppy. All events when individual is distinguished $ \ Omega $ (“People who answered like” is 1 and “People who answered dislike” is 0) and random variables when not distinguishing individuals are $ X $ All events (sample space) $ \ chi $ of $ X $ at that time are
Definition:
If the cumulative distribution function of the random variable $$ is $ F_X (x)
, it can be expressed as $ F_X (x) = P (X \ leq x) $$.
Example: What is the probability of rolling the dice once and getting 4 or less? Answer: $ F_X (4) = P (X \ leq 4) = 4/6 = 2/3 $. about it. The cumulative distribution function is also simply called the distribution function. Random variables $ X $ when variables take discrete values like dice are called discrete random variables, and when variables take continuous values like temperature, they are called continuous random variables.
The $ $ cumulative distribution function $ F_X (x) $ considers the cumulative ($ X \ leq x $) probability, but then the (pinpoint) probability of $ X = x $.
** ・ Discrete type **
f_X(x) = \left\{ \begin{array}{ll}
p(x_i) & (x=x_When i) \\
0 & (x \notin \When chi)
\end{array} \right.
Can be expressed as. I have omitted the exact wording, but the characters used are the same as the meanings of the characters that have appeared so far. ** ・ Continuous type ** In the case of the continuous type, the probability cannot be calculated because it is not possible to consider only one variable. For example, even if you try to represent the real number 1 on the real line, it will continue infinitely as 1.0000000000 .... So, let's consider the probability that the variable has a little width instead of one point. Definition:
For the continuous random variable $ X
, $ F_X (x) =
\int_{-\infty}^x f_X(t) dt, \ -\infty<x<\infty \tag{1} \ $$ When the function $ f_X (x) $ that becomes is present, $ f_X (x) $ is called ** probability density function **.
For example, what is the probability that tomorrow's temperature $ T [℃] $ will be $ 22 \ leq T \ leq25 $? It is a way of thinking. $ F_X (x) $ is the cumulative distribution function. I think the expression density will soon get used to it. Since it is a probability, of course,
First, from the definition of expected value:
The expected value of the function $ g (X) $ of the random variable $ X $ is represented by $ E [g (X)] $.
E[g(X)] = \left{ \begin{array}{ll} \int_{-\infty}^{\infty} g(x)f_X(x) dx& (When X is a continuous random variable) \ \sum_{x_i \in \chi}g(x_i)f_X(x_i) & (When X is a discrete random variable) \end{array} \right.
It is expressed as.
$ f_X (x) $ is the probability function mentioned above. In other words, you are adding up the product of the value of each variable $ x $ and the probability that that value will occur. The reason why the expected value is important is that the mean and variance, which are the characteristic values (reduced information) of the probability distribution, are also the expected values of the function $ g (X) $ of a random variable $ X $.
·average
When $ g (X) = X $, the expected value of $ X $ $ E [X] $ is called the average of $ X $. It is expressed as $ E [X] = \ mu $. For translation and scale changes
$$E[aX+b]=aE[X]+b$$
It will be.
・ Dispersion
When $ g (X) = (XE [X]) ^ 2 $, the expected value $ E [(X- \ mu) ^ 2] $ is called the variance of $ X $, and $ V (X) $ or $ It is expressed as \ sigma ^ 2 $. $ \ sigma = \ sqrt {V (X)} $ is called the standard deviation of $ X $. Variance represents the degree of data dispersion, and the standard deviation is one that is easier to calculate by dropping one dimension. I will omit the proof, but the variance is for translation and scale change.
$$V[aX+b]=a^2V[X]$$
It will be. Since the variance originally considers the square of the deviation (the difference between the mean value and each data), I think it makes sense. I think you can intuitively understand that even if the data is translated, the degree of scattering does not change.
* Probability generating function, moment generating function, and characteristic function are likely to be long, so I will introduce them in one article at another time. As the name suggests, it is a function that can automatically obtain the probability function and product ratio.
# Let's run python
Now let's use python to look at the probability density function and the cumulative distribution function of the standard normal distribution (which we will see in the next chapter).
```python
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import norm
fig,ax = plt.subplots()
x1 = np.arange(-5,5,0.1)
x2 = np.arange(-5,5,0.01)
y = (np.exp(-x2**2/2)) / np.sqrt(2*np.pi)
Y = norm.cdf(x1,loc=0,scale=1)#Cumulative distribution function of standard normal distribution(cumulative distribution function)Calculate
c1,c2 = "red","blue"
ax.set_xlabel("x")
ax.set_ylabel("plobability")
plt.grid(True)
plt.plot(x1,Y,color=c1,label=l1)
plt.plot(x2,y,color=c2,label=l2)
plt.show()
When you do this, it will look like the figure below The blue graph is the standard normal distribution probability density function $ f_X (x) $, and the red graph is the cumulative distribution function $ F_X (x) $. You can see that the cumulative distribution function is approaching 0 to 1.
This is the end of Chapter 2. Thank you very much.
"Basics of Modern Mathematical Statistics" by Tatsuya Kubokawa
Recommended Posts