[PYTHON] Mathematical statistics from the basics Random variables

Random variables and dice

First, consider the example of undistorted 1-6 dice: game_die :.

Since the dice rolls 1 to 6 are equally probable (there is no bias in each roll), each roll can be given with the following probabilities.

$ P (probability of getting a 1) = \ frac {1} {6} \ qquad P (probability of getting a 2) = \ frac {1} {6} \ qquad P (probability of getting a 3) = \ frac {1} {6} \
P (probability of getting a 4) = \ frac {1} {6} \ qquad P (probability of getting a 5) = \ frac {1} {6} \ qquad P (probability of getting a 6) = \ frac {1} {6} $

Random variable $ X $ here If you define it as follows

$ X = \left\{ \begin{array}{ll} 1 & (when a 1 is rolled) \
2 & (when a 2 roll appears) \
3 & (when a 3 roll appears) \
4 & (when the 4 rolls) \
5 & (when a 5 is rolled) \
6 & (when a 6 is rolled) \
\end{array}\right. $

It will be. A variable that fluctuates stochastically, such as $ X $ here, is called a ** random variable **. The value actually taken by the random variable here is called the ** realization value **.

$ P(X = x) = \frac{1}{6}, \qquad x = 1,2,3,4,5,6 $

Let's actually roll the dice with python.

import numpy as np
import matplotlib as mpl

np.random.seed()

prob_dice = np.array([])
dice = np.array([1,2,3,4,5,6])
dice_data = np.random.choice(dice, dice_times)
dice_times = 10000

for i in range(1,7):
    p = len(dice_data[dice_data == i]) / dice_times
    print(i, "Probability of appearing", p)
    prob_dice = np.append(prob_dice, len(dice_data[dice_data == i]) / dice_times)
    
plt.bar(dice, prob_dice)
plt.grid(True)

The following is the result. This time, the dice are rolled 10,000 times, and the probability is given. As the results show, each eye is close to $ \ frac {1} {6} = 0.1666 ... $.

image.png

Probability function and cumulative distribution function

There are various random variables, and $ X $ is a ** discrete random variable ** when the possible values of $ X $ are finite or infinite (1, 2, 3, 4,). It means a value that is discrete like 5 ...), and $ X $ is a ** continuous random variable ** when it has a density function. In the case of discrete probabilities, the probability is considered for each $ x $ as in the previous dice, and the function of $ x $ is called the ** probability function **, which can be expressed as follows.

$ p(x) = P(X = x)\
$

The probability function has the following properties. Note that $ \ sum $ here represents the sum of probabilities.

$ p(x) \ge 0, \qquad \forall x \
\sum_{x}^{} p(x) = 1 $

The cumulative sum of the probability functions is called the ** cumulative distribution function or distribution function **. The distribution function has the following properties, such as monotonicity and right continuity.

$ F(x) = P(X \le x) = \sum_{y \le x} p(y)\
(1) \quad \lim_{n \to -\infty}F(x) = 0\
(2) \ quad \ forall x, y \ in \ mathbb {R} (real number) \
\qquad F(x) \ge F(y), \quad F(x) = \lim_{\varepsilon \to 0}F(x + \varepsilon)\
(3) \quad \lim_{n \to +\infty}F(x) = 1 $

Here, in $ \ forall x $, $ F (X) $ is right continuous (expressed as $ F (X +) = F (X) $), and $ x_n $ is a sequence that decreases monotonically and converges. $ \ lim_ {x_n \ to + \ infty} F (x_n) = F (x) $. Here, $ x + $ indicates that it decreases monotonically from the positive direction and converges to $ x $. Then, the probability function can be obtained by taking the difference between the cumulative distribution functions of $ X $ as shown below.

$ p(x) = F(x) - \lim_{x_n \to x-} F(x_n) = F(x) - F(x-) $

If you implement the cumulative distribution in python, it will be as follows.

import numpy as np
import matplotlib as mpl
from scipy.stats import norm
import matplotlib.pyplot as plt

x = np.arange(0,3000)
y = norm.cdf(x, loc=1500, scale=500)

plt.plot(x,y)
plt.grid(True)
plt.xlabel("value")
plt.ylabel("possibility")

コメント 2020-01-31 151132.png

Recommended Posts

Mathematical statistics from the basics Random variables
Deep Learning from the mathematical basics Part 2 (during attendance)
What beginners learned from the basics of variables in python
Learn the basics while touching python Variables
Obtain statistics etc. from the extracted sample
Deep Learning from mathematical basics (during attendance)
[Introduction to Data Scientists] Basics of Probability and Statistics ♬ Probability / Random Variables and Probability Distribution
[Statistics review] Four arithmetic operations of random variables
Access the variables defined in the script from the REPL
What I thought about in the entrance exam question of "Bayesian statistics from the basics"
Mathematical understanding of principal component analysis from the beginning
[Basics of Modern Mathematical Statistics with python] Chapter 1: Probability
Learning from the basics Artificial intelligence textbook Chapter 5 Chapter end problems
[Statistics for programmers] Random variables, probability distributions, and probability density functions