Up to the last time, [Introduction to Data Scientists] Descriptive Statistics and Simple Regression Analysis has been summarized. This time, we will summarize probabilities / random variables and probability distributions as the basis of probabilities and statistics. I will supplement the explanations in this book. 【Caution】 ["Data Scientist Training Course at the University of Tokyo"](https://www.amazon.co.jp/%E6%9D%B1%E4%BA%AC%E5%A4%A7%E5%AD%A6%E3 % 81% AE% E3% 83% 87% E3% 83% BC% E3% 82% BF% E3% 82% B5% E3% 82% A4% E3% 82% A8% E3% 83% B3% E3% 83 % 86% E3% 82% A3% E3% 82% B9% E3% 83% 88% E8% 82% B2% E6% 88% 90% E8% AC% 9B% E5% BA% A7-Python% E3% 81 % A7% E6% 89% 8B% E3% 82% 92% E5% 8B% 95% E3% 81% 8B% E3% 81% 97% E3% 81% A6% E5% AD% A6% E3% 81% B6 % E3% 83% 87% E2% 80% 95% E3% 82% BF% E5% 88% 86% E6% 9E% 90-% E5% A1% 9A% E6% 9C% AC% E9% 82% A6% I will read E5% B0% 8A / dp / 4839965250 / ref = tmm_pap_swatch_0? _ Encoding = UTF8 & qid = & sr =) and summarize the parts that I have some doubts or find useful. Therefore, I think the synopsis will be straightforward, but please read it, thinking that the content has nothing to do with this book.
Import the library used in this chapter.
import numpy as np
import scipy as sp
import pandas as pd
from pandas import Series, DataFrame
import matplotlib as mpl
import seaborn as sns
import matplotlib.pyplot as plt
sns.set()
np.random.seed(0)
Probability, trial, root event, sample space, event, conditional probability, Bayes' theorem, pre-probability, posterior probability
Using dice as a subject, learn the terms and concepts necessary for learning probability.
#Possible values of dice
dice_data =np.array([1,2,3,4,5,6])
Each event; elementary event
print('Only one randomly extracted', np.random.choice(dice_data,1))
#Only one randomly extracted[3]
Set of all elementary events: sample space Ω Any subset of sample space: event
Here, if you change from 1 to 10, the following also happens
print('10 randomly extracted', np.random.choice(dice_data,10))
#10 randomly extracted[2 3 4 2 2 1 6 2 2 5]
\begin{align}
&Kolmogorov's axiom\\
&Probability P that a certain event E (Event) occurs(E)If you write, the following is an axiom.\\
&Axiom 1: 0 for all events A\leq P(A)\leq 1\\
&Axiom 2; P(\Omega)=1\\
&Axiom 3: Additional events that are mutually exclusive, A_1,A_2,A_3,.. .. .. Against\\
&P(A_1\cup A_2\cup A_3\cup...)=P(A_1)+P(A_2)+P(A_3)+...=\Sigma_i P(A_i)\\
\end{align}
It is different from the axiom of this book, but it is quoted from the reference book. Axiom 3 of this book describes the case of $ A_1 $, $ A_2 $. 【reference】 [Basic Mathematics Probability / Statistics I p6 University of Tokyo Engineering Course](https://www.amazon.co.jp/%E5%9F%BA%E7%A4%8E%E7%B3%BB-%E6%95 % B0% E5% AD% A6-% E7% A2% BA% E7% 8E% 87% E3% 83% BB% E7% B5% B1% E8% A8% 88I-% E6% 9D% B1% E4% BA % AC% E5% A4% A7% E5% AD% A6% E5% B7% A5% E5% AD% A6% E6% 95% 99% E7% A8% 8B-% E7% B8% 84% E7% 94% B0 / dp / 4621087150)
An empty event $ \ phi $ is an event that has no elements. For example, when A and B are mutual exclusivity events, $ A \ cap B $ is an empty event $ \ phi $. $ P (\ phi) = 0 $.
An event that does not belong to a certain event E is called a complementary event. This is also called the complement of E. Expressed using c in complement.
E=\{2,4,6\}\\
E^c=\{1,3,5\}
A=\{1,2,3\}\\
B=\{1,3,4,5\}
$ A \ cap B $; Product event
A\cap B = \{1,3\}
$ A \ cup B $; Sum event
A\cup B = \{1,2,3,4,5\}
Calculate the probabilities that "event X with 3", "empty event", "product event of A and B", and "sum event of A and B" occur respectively.
\begin{align}
P(X) &= \frac{1}{6}\\
P(\phi ) &= 0\\
P(A\cap B)&=\frac{1}{3}\\
P(A\cup B)&=\frac{5}{6}\\
\end{align}
Probability of rolling dice
Number of trials | 1 | 2 | 3 | 4 | 5 | 6 |
---|---|---|---|---|---|---|
1 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 |
10 | 0.1 | 0.5 | 0.2 | 0.1 | 0.1 | 0.0 |
100 | 0.16 | 0.17 | 0.18 | 0.19 | 0.18 | 0.12 |
1000 | 0.176 | 0.152 | 0.18 | 0.158 | 0.164 | 0.17 |
10000 | 0.1695 | 0.1701 | 0.1642 | 0.166 | 0.1617 | 0.1685 |
100000 | 0.16776 | 0.16621 | 0.16661 | 0.16573 | 0.16724 | 0.16645 |
1000000 | 0.166517 | 0.166637 | 0.166131 | 0.16692 | 0.166876 | 0.166919 |
The following is my personal opinion. .. .. I think this result is interesting. In other words, even if there are 6 candidates and the voters choose them evenly (randomly), the results will differ depending on the number of trials (number of voters), so about 1000 voters have a slight advantage. In that case, the result is unreliable. So, the truth cannot be seen without statistically verifying the superiority, and one vote is not carefully voted.
calc_steps = 1000
print('calc_steps=',calc_steps)
dice_rolls = np.random.choice(dice_data,calc_steps)
for i in range(1,7):
p = len(dice_rolls[dice_rolls==i])/calc_steps
print(i, 'Probability of appearing;', p)
The probability that event B will occur under the condition that event A has occurred is called the conditional probability of B under the condition that A is given.
P(B|A)=\frac{P(A\cap B)}{P(A)}
It is expressed as. This formula is transformed as follows and is called the ** multiplication theorem **.
P(A\cap B)=P(B|A)P(A)
Concrete example;
Even-numbered eyes on the dice
A={2,4,6}
4 or more events
B={4,5,6}
Product event of A and B
$A\cap B = ${4,6}
P(B|A)=\frac{P(A\cap B)}{P(A)}=\frac{\frac{2}{6}}{\frac{3}{6}}=\frac{2}{3}
The fact that event A and event B are independent of each other means that
P(A|B)=P(A)
Is true. At this time, the following holds from the multiplication theorem.
P(A\cap B)=P(A)P(B)
If this equation does not hold, then event A and event B are said to be dependent on each other. Considering the above dice example, it is as follows.
\begin{align}
P(A\cap B)&=\frac{2}{6}&=\frac{1}{3}\\
P(A)P(B)&=\frac{3}{6}\frac{3}{6}&=\frac{1}{4}
\end{align}
It can be seen that event A and event B are not independent but dependent because they are not equal.
What about when C = {1,3,5}
?
Product event of A and C
$A\cap C = ${}
Again, this is not equal to 0 and $ \ frac {1} {4} $, respectively, and is a dependency.
I will explain Bayes' theorem. Quoted from reference "Sometimes we want to know the cause B from the result A ... In such a case, the conditional probability P of B with A as a condition.(B|A)Is required. What is required in experiments and surveys is P(A|B)So P(B|A)You need an equation to calculate. " Given the above conditional probabilities, when we consider A as the resulting event and B as the causative event, we obtain the following Bayes' theorem. This is to find the probability that the cause is the B event when the result of A is known. Note that $ B ^ c $ is the complement of B.
P(B|A)=\frac{P(B\cap A)}{P(A)}\\
∵P(A\cap B)=P(B\cap A)\\
=\frac{P(A|B)P(B)}{P(A)}\\
=\frac{P(A|B)P(B)}{P(A|B)P(B)+P(A|B^c)P(B^c)}\\
P (B); Probability of event B before event A; Prior probability P (B | A); Probability of event B after event A; posterior probability P (A | B); Probability that A will occur if B occurs; Likelihood
I refer to the following. 【reference】 [Basic Mathematics Probability / Statistics I p18 University of Tokyo Engineering Course](https://www.amazon.co.jp/%E5%9F%BA%E7%A4%8E%E7%B3%BB-%E6%95 % B0% E5% AD% A6-% E7% A2% BA% E7% 8E% 87% E3% 83% BB% E7% B5% B1% E8% A8% 88I-% E6% 9D% B1% E4% BA % AC% E5% A4% A7% E5% AD% A6% E5% B7% A5% E5% AD% A6% E6% 95% 99% E7% A8% 8B-% E7% B8% 84% E7% 94% B0 / dp / 4621087150) "There are k causative events of $ B_1, B_2, ..., B_k $, and these events are mutual exclusivity events, and no other events occur."
P(A)=P[\cup _{i=1}^{k}(A\cap B_i)]\\
=\sum _{i=1}^{k}P(A\cap B_i)\\
=\sum _{i=1}^{k}P(A|B_i)P(B_i)
Will be. Substituting this into the above equation
P(B_i|A)=\frac{P(A|B_i)P(B_i)}{\sum _{j=1}^{k}P(A|B_j)P(B_j)}
Number of trials | table | back |
---|---|---|
10 | 0.6 | 0.4 |
100 | 0.51 | 0.49 |
1000 | 0.479 | 0.521 |
10000 | 0.505 | 0.495 |
100000 | 0.49939 | 0.50061 |
1000000 | 0.500783 | 0.499217 |
10000000 | 0.500065 | 0.499935 |
calc_steps = 1000
print('calc_steps=',calc_steps)
dice_rolls = np.random.choice(dice_data,calc_steps)
for i in range(0,2):
p = len(dice_rolls[dice_rolls==i])/calc_steps
print(i, 'Probability of appearing;', p)
Probability variable, probability function, probability density function, expected value, uniform distribution, Bernoulli distribution, binomial distribution, normal distribution, Poisson distribution, lognormal distribution, kernel density estimation
A random variable is a variable to which a probability is assigned to a possible value. Considering the dice, the possible values of the variables are 1-6, and the probability of appearance is equally assigned to 1/6 as shown in the table below. Here, the variable X is called a random variable, which indicates possible values, and P (X) is the probability of each occurrence.
X | 1 | 2 | 3 | 4 | 5 | 6 |
---|---|---|---|---|---|---|
P(X) | 1/6 | 1/6 | 1/6 | 1/6 | 1/6 | 1/6 |
Uppercase X represents a random variable, and lowercase x represents its realization. The uppercase P (X) represents the probability of appearance, and the lowercase p (x) represents the realization value.
\Sigma _{i=1}^6 p(x_i)= 1
The "" part below is quoted from the above reference p21. "When the random variable X is a discrete type, the following is called the probability distribution of X.
P(X=x_i) = f(x_i)
The probability at each point is a function of $ x $ (the subscript i is omitted below), and $ f (x) $ is called a probability function. "
The distribution function (cumulative probability distribution function) is defined below.
F(X)=P(X\leq x) = \Sigma _{x_i \leq x}p(x_i)
Here, in the case of a continuous random variable, the derivative of the distribution function is called a density function (probability density function) and is defined as follows.
f(x)= \frac{dF(x)}{dx}
Assuming that the random variable X, the definition formula of the expected value E (X) is as follows.
E(x)= \Sigma _x xf(x)
For example, the dice are as follows.
\begin{align}
E(x)&= 1*\frac{1}{6}+2*\frac{1}{6}+3*\frac{1}{6}+4*\frac{1}{6}+5*\frac{1}{6}+6*\frac{1}{6}\\
&=\frac{21}{6}\\
&=3.5
\end{align}
Those with the same probability of all events occurring are called uniform distributions. For dice,
prob_data = []
print('calc_steps=',calc_steps)
dice_rolls = np.random.choice(dice_data,calc_steps)
for i in range(1,7):
p = len(dice_rolls[dice_rolls==i])/calc_steps
prob_data.append(p)
print(i, 'Probability of appearing;', p)
plt.bar(dice_data, prob_data)
plt.grid()
plt.show()
>python dice_choice.py
calc_steps= 1000
Probability of 1; 0.153
Probability of 2; 0.17
Probability of getting 3; 0.161
Probability of getting 4; 0.17
Probability of getting 5; 0.148
Probability of getting 6; 0.198
Bernoulli trial; trial with binary results Bernoulli distribution; probability of each event occurring in a single Bernoulli trial For coin throwing, the probability that the front will come out, the probability that the back will come out
The following is the Bernoulli distribution when the result of throwing a coin eight times is [0,0,0,0,0,1,1,1].
prob_be_data = []
coin_data = np.array([0,0,0,0,0,1,1,1])
for i in np.unique(coin_data):
p = len(coin_data[coin_data==i])/len(coin_data)
prob_be_data.append(p)
print(i, 'Probability of appearing;', p)
plt.bar([0,1], prob_be_data, align = 'center')
plt.xticks([0,1], ['head', 'tail'])
plt.grid()
plt.show()
Probability of getting 0; 0.625
Probability of 1; 0.375
Create data based on a specific distribution. Below, we will look at the characteristics of the distribution data by calculating using various distribution functions of numpy.
The binomial distribution is an independent Bernoulli trial repeated n times. It can be generated as follows. (30 trials, probability 0.5), when I tried 1000 times, it is as follows. There is an average around 15 and the variance is 30/4
np.random.seed(0)
x = np.random.binomial(30,0.5,1000)
print(x.mean(), x.var())
#14.98 7.6176
plt.hist(x)
plt.grid()
plt.show()
10000sample
The number of raindrops per unit area, the number of trees per square meter, etc. The number that the event is expected to occur in a certain section is set to 7, and the sample 1000 is as follows.
np.random.seed(0)
x = np.random.poisson(7,1000)
print(x.mean(), x.var())
#6.885 6.457775
plt.hist(x)
plt.grid()
plt.show()
10000sample
Normal distribution mu; mean, sigma; standard deviation
np.random.seed(0)
x = np.random.normal(5,10,1000)
print(x.mean(), x.var())
#4.547432925098047 97.42344563121543
plt.hist(x)
plt.grid()
plt.show()
Lognormal distribution with logarithmic axis on the horizontal axis
np.random.seed(0)
mu, sigma = 3., 0.4
x = np.random.lognormal(mu,sigma,1000)
print(x.mean(), x.var())
#21.329823289919993 76.9718785871209
plt.hist(x)
plt.grid()
plt.show()
The meaning of the mean and standard deviation is difficult to understand, so while looking at the reference, Let's standardize the vertical axis and line it up with the one calculated by the formula.
s = np.random.lognormal(mu, sigma, 1000)
count, bins, ignored = plt.hist(s, bins=12, range =(0,60), density=True, align='mid')
x = np.linspace(min(bins), max(bins),120)
pdf = (np.exp(-(np.log(x) - mu)**2 / (2 * sigma**2))/ (x * sigma * np.sqrt(2 * np.pi)))
plt.plot(x, pdf, linewidth=2, color='r')
plt.axis('tight')
plt.show()
Estimate the density function using the given data. To estimate the absentee data using the kernel density function, do the following:
student_data_math = pd.read_csv('./chap3/student-mat.csv', sep =';')
student_data_math.absences.plot(kind='kde', style='k--')
student_data_math.absences.hist(density=True,align='mid')
plt.grid()
plt.show()
I don't understand what kind of function kde assumes, so from the reference 【reference】 [Kernel density estimation @wikipedia](https://ja.wikipedia.org/wiki/%E3%82%AB%E3%83%BC%E3%83%8D%E3%83%AB%E5%AF%86 % E5% BA% A6% E6% 8E% A8% E5% AE% 9A #: ~: text =% E3% 82% AB% E3% 83% BC% E3% 83% 8D% E3% 83% AB% E5 % AF% 86% E5% BA% A6% E6% 8E% A8% E5% AE% 9A% EF% BC% 88% E3% 82% AB% E3% 83% BC% E3% 83% 8D% E3% 83 % AB% E3% 81% BF% E3% 81% A4,% E3% 83% 8E% E3% 83% B3% E3% 83% 91% E3% 83% A9% E3% 83% A1% E3% 83% 88% E3% 83% AA% E3% 83% 83% E3% 82% AF% E6% 89% 8B% E6% B3% 95% E3% 81% AE% E3% 81% B2% E3% 81% A8% E3% 81% A4% E3% 80% 82 & text =% E5% A4% A7% E3% 81% BE% E3% 81% 8B% E3% 81% AB% E8% A8% 80% E3% 81% 88% E3 % 81% B0% E3% 80% 81% E3% 81% 82% E3% 82% 8B,% E3% 83% 87% E3% 83% BC% E3% 82% BF% E3% 82% 92% E5% A4% 96% E6% 8C% BF% E3% 81% A7% E3% 81% 8D% E3% 82% 8B% E3% 80% 82) "Let x1, x2, ..., xn be samples from independent identical distributions with (unknown) probability density functions 0082. Kernel density estimates for kernel function K, bandwidth (smoothing parameter) h. What is kernel density estimator)?
\begin{align}
f(x)&=\frac{1}{nh}∑^n_{i=1}K(\frac{x−X_i}{h})\\
\\
Here normal kernel&The following normal distribution is used for the function.\\
\\
K(x)&=\frac{1}{2\pi}e^{\frac{x^2}{2}}\\
\end{align}
Joint probability distribution, peripheral probability function, conditional probability function, conditional mean, covariance matrix, multidimensional normal distribution
Consider a discrete random variable whose value is X on $ {x_0, x_1, ...} $ and Y on $ {y_0, y_1, ...} $. If the probabilities that $ X = x_i $ and $ Y = y_i $ are written as follows, the following is called a simultaneous probability function.
P(X=x_i, Y=y_i) = p_{X,Y}\{x_i,y_i\}
The following is called the peripheral probability function of X.
P_X(x_i) = \Sigma _{j=0}^{\infty }p_{X,Y}\{x_i,y_i\}
Y is defined in the same way.
The conditional probability function of $ Y = y_i $ given $ X = x_i $ can be defined as follows.
p_{Y|X}(y_j|x_i)=P(Y=y_j|X=x_i)=\frac{p_{X,Y}(x_i,y_j)}{p_X(x_i)}
Also, the conditional expected value of Y when $ X = x_i $ is given is as follows.
E[Y|X=x_i]=\Sigma _{j=1}^{\infty }y_jp_{Y|X}(y_j|x_i)=\frac{\Sigma _{j=1}^{\infty }y_ip_{X,Y}(x_i,y_j)}{p_X(x_i)}
The definition of independence in two variables is assumed to be independent for all xi and yi when the following holds.
p_{X|Y}(x_i|y_j)=p_X(x_i)p_Y(y_j)
For continuous distribution, simultaneous probability density function, peripheral probability density function, conditional probability density function, independent definition, etc. can be defined, but they are omitted.
The following graph was displayed in the following program.
import matplotlib.pyplot as plt
import numpy as np
import scipy.stats as st
from scipy.stats import multivariate_normal
from mpl_toolkits.mplot3d import Axes3D
x,y =np.mgrid[10:100:2,10:100:2]
pos = np.empty(x.shape + (2,))
pos[:,:,0]=x
pos[:,:,1]=y
rv = multivariate_normal([50,50],[[100,0],[0,100]])
z=rv.pdf(pos)
fig = plt.figure(dpi=100)
ax=Axes3D(fig)
ax.plot_wireframe(x,y,z)
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('f(x,y)')
ax.ticklabel_format(style='sci', axis = 'z',scilimits=(0,0))
plt.show()
The program is a little difficult, so I will explain it. It is important to understand the following two lines.
rv = multivariate_normal([50,50],[[100,0],[0,100]])
z=rv.pdf(pos)
The following references will help you understand. 【reference】 ① [Multivariate normal distribution @wikipedia](https://ja.wikipedia.org/wiki/%E5%A4%9A%E5%A4%89%E9%87%8F%E6%AD%A3%E8%A6 % 8F% E5% 88% 86% E5% B8% 83) ② Generate data according to multidimensional normal distribution According to Reference (2), the above code can be written as follows. Where mu is the mean and sigma is the covariance matrix.
mu = [50,50]
sigma = [[100,0],[0,100]]
rv = multivariate_normal(mu, sigma)
The formula is as follows.
\mu=\begin{pmatrix}\mu_x \\ \mu_y \end{pmatrix}\\
\sum=
\begin{pmatrix} \sigma_x^2 & Cov(x,y) \\ Cov(y,x) & \sigma_y^2 \end{pmatrix}\\
=\begin{pmatrix} \sigma_x^2 & \rho \sigma_x\sigma_y \\ \rho\sigma_y\sigma_x & \sigma_y^2 \end{pmatrix}
That is, here it is as follows.
\mu=\begin{pmatrix} 50 \\ 50\end{pmatrix}\\
\sum=\begin{pmatrix}100 & 0 \\ 0 & 100\end{pmatrix}
And the whole formula of multivariate_normal (mu, sigma) is calculated by the grid with z = rv.pdf (pos) for the set of (x, y) below.
f(x,y)=\frac{1}{2\pi\sigma_x\sigma_y\sqrt{1-\rho^2}}\exp(-\frac{1}{2(1-\rho^2)}[\frac{(x-\mu_x)^2}{\sigma_x^2}+\frac{(y-\mu_y)^2}{\sigma_y^2}-\frac{2\rho(x-\mu_x)(y-\mu_y)}{\sigma_x\sigma_y}])
In addition, almost the same code is posted in Reference ③ below. 【reference】 ③Python plot 3d scatter and density ④scipy.stats.multivariate_normal Also, in order to understand z = rv.pdf (pos), pdf has the following meaning and takes an argument as written as a bonus. I'm using: pdf=Probability density function.
【reference】 Python 3: How to write a 3D graph (matplotlib, pyplot, mplot3d, MPL) The code is as follows.
def func(x,y):
mux,muy,sigmax,sigmay,rho=50.,50.,10.,10.,0.
fxy=np.exp(-((x-mux)**2/(sigmax**2) + (y-muy)**2/(sigmay**2)-2*rho*(x-mux)*(y-muy)/(sigmax*sigmay))/2*(1-rho**2))/(2*np.pi*sigmax*sigmay*np.sqrt(1-rho**2))
return fxy
x,y =np.mgrid[10:100:2,10:100:2]
z=func(x,y)
fig = plt.figure(dpi=100)
ax=Axes3D(fig)
ax.plot_wireframe(x,y,z)
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('f(x,y)')
ax.ticklabel_format(style='sci', axis = 'z',scilimits=(0,0))
plt.show()
Since the appearance is the same, the same picture was obtained as shown below. When mux, muy, sigmax, sigmay, rho = 50., 50., 10., 10., 0.3 And rv = multivariate_normal([50,50],[[100,30],[30,100]]) When I calculated with, the same result was obtained as follows.
What is required as rv.Methods () of rv = multivariate_normal (mu, sigma)
Methods | Commentary |
---|---|
pdf(x, mean=None, cov=1, allow_singular=False) |
Probability density function. |
logpdf(x, mean=None, cov=1, allow_singular=False) |
Log of the probability density function. |
cdf(x, mean=None, cov=1, allow_singular=False, maxpts=1000000*dim, abseps=1e-5, releps=1e-5) |
Cumulative distribution function. |
logcdf(x, mean=None, cov=1, allow_singular=False, maxpts=1000000*dim, abseps=1e-5, releps=1e-5) |
Log of the cumulative distribution function. |
rvs(mean=None, cov=1, size=1, random_state=None) |
Draw random samples from a multivariate normal distribution. |
entropy() |
Compute the differential entropy of the multivariate normal. |
Recommended Posts