Given the probability distribution of the population, calculate the sampling distribution. This time we will deal with discrete distributions.
--Use the statistical package scipy.stats of scipy, which is a collection of scientific and technological calculation packages of python, and numpy of the numerical calculation library. --Calculate in the following 3 steps.
scipy.stats.rv_discrete The following is an excerpt from the scipy documentation. You can find it in the Help Example section, which you can read as follows:
from scipy import stats
help(stats.rv_discrete)
Just pass the discrete random variable in $ xk $, the probability corresponding to $ pk $, and pass it to stats.rv_discrete.
This is an example of a random variable that takes 7 discrete values.
from scipy import stats
xk = np.arange(7)
pk = (0.1, 0.2, 0.3, 0.1, 0.1, 0.0, 0.2)
custm = stats.rv_discrete(name='custm', values=(xk, pk))
To visualize the probability mass function (pmf) of a discrete distribution, do the following:
import matplotlib.pyplot as plt
fig, ax = plt.subplots(1, 1)
ax.plot(xk, custm.pmf(xk), 'ro', ms=12, mec='r')
ax.vlines(xk, 0, custm.pmf(xk), colors='r', lw=4)
plt.show()
Sampling is performed from the created discrete distribution. For example, if you want to randomly sample 1000 times, do as follows. (rvs: random variables) If you want to fix the seed of the random number to ensure reproducibility, uncomment the first line.
import numpy as np
# np.random.seed(0)
sampled_rvs = custm.rvs(size=1000)
Calculate using the histogram function of numpy. Be careful not to forget to add +1 to bin.
f = np.histogram(sampled_rvs, bins = np.arange(7 + 1), density=True)[0]
The above contents are summarized.
Up until now, we have dealt with the case of one-dimensional probability distributions, but so that we can handle multidimensional joint probability distributions as well.
First, crush it to one dimension with p.ravel (), then calculate the sample distribution, reshape it, and then return it.
Pass the argument p as an array of nupmy.
import numpy as np
from scipy import stats
def sampling_dist(p, size, seed = 0):
xk = np.arange(p.size)
pk = p.ravel()
true_dist = stats.rv_discrete(name='true_dist', values=(xk, pk))
np.random.seed(seed)
sampled_rvs = true_dist.rvs(size=size)
f = np.histogram(sampled_rvs, bins = np.arange(p.size + 1), density=True)[0]
f.reshape(p.shape)
return(f.reshape(p.shape))
p = np.array([[0.10,0.10],[0.10,0.15],[0.15,0.10],[0.10,0.20]])
p_est = sampling_dist(p, 1000)
print(p)
print(p_est)
Recommended Posts