Using Python's Bayesian inference library PyMC3, from a bimodal discrete probability distribution to the expected Poisson distribution $ \ lambda_1, \ lambda_2 $, and to each peak Estimate the ratio to which it belongs.
Operating environment
#OS
import platform
print(platform.platform())
#Windows-10-10.0.15063-SP0
#python version
print(platform.python_version())
#version 3.7.7
#Library
import numpy as np #version 1.19.1
import pandas as pd #version 1.1.0
from scipy.stats import poisson #version 1.5.0
import matplotlib.pyplot as plt #version 3.2.2
import seaborn as sns #version 0.10.1
import pymc3 as pm #version 3.8
import theano.tensor as T #1.0.4
Assuming that the number of Bernoulli trials is the same, consider a discrete probability distribution in which counties with expected success times $ \ lambda $ of 3 and 10 are mixed.
#Randomly generate the number of successes when the expected value 3 is 900 and the expected value 10 is 1000 times.
obs = np.hstack((poisson.rvs(3,size=900), poisson.rvs(10,size=1000)))
sns.countplot(obs)
Output result
(I want to estimate the expected value of the two peaks of this $ \ lambda $ and the ratio)
When the county with the expected value of 3 is A and the county with the expected value of 10 is B, Since it is not possible to know in advance which county the observed value $ n_x $ belongs to, The probability $ p (A) $ that the observed value $ n_x $ is A follows a uniform distribution from 0 to 1. Then, the probability $ p (B) $ that the observed value $ n_x $ is B is $ 1-p (A) $.
Furthermore, the expected values of A and B are estimated assuming that the true expected values of A and B are ** unknown **. Assuming that the prior distribution follows an exponential distribution, the mean value of the observed values is applied to $ \ lambda $ of the expected value $ \ frac {1} {\ lambda} $ of the exponential distribution. At this time, I think that there are two peaks. The posterior distribution is estimated using the value obtained from the prior distribution as the expected value mu and the observed value as the value randomly generated earlier.
with pm.Model() as model:
p1 = pm.Uniform('p1',0,1) #p(A)Probability of
p2 = 1 - p1 #p(B)Probability of
p = T.stack([p1, p2])
#p1=0, p2=Prepare as many categorical variables as 1 for the number of samples.
assignment = pm.Categorical('assignment', p, shape=obs.shape[0], testval = np.random.randint(0, 2, obs.shape[0]))
#Expected value of prior distribution= 1 /Set as an exponential distribution with the average of observed values
lambda_ = pm.Exponential('lambda_', 1./obs.mean(), shape=2)
lambda_i = pm.Deterministic('lambda_i', lambda_[assignment])
#mu to lambda_Set the generated value to observed
observations = pm.Poisson('obs', mu=lambda_i, observed=obs)
#search
step1 = pm.Metropolis(vars=[p, lambda_]) #Discrete values set to Metropolis
step2 = pm.ElemwiseCategorical(vars=[assignment])
trace = pm.sample(5000, step=[step1, step2])
#Plot the results
plt.figure(figsize=(12, 8))
plt.subplot(211)
plt.plot(trace['lambda_'][:,1], label='Group A', c='r')
plt.plot(trace['lambda_'][:,0], label='Group B', c='c')
plt.ylim(2, 11)
plt.xlabel('Number of searches')
plt.ylabel('Expected value')
plt.title('Estimated result')
leg = plt.legend(loc='upper right')
leg.get_frame().set_alpha(0.7)
plt.subplot(212)
plt.plot(1 - trace['p1'], label='Percentage of samples selected for County A', c='r')
plt.legend(loc='upper right')
plt.xlabel('Number of searches')
plt.ylabel('Percentage(%)')
plt.show()plt.figure(figsize=(12, 8))
#Outputs the average value of searches after the 1000th time
trace_lambda_1 = trace['lambda_'][:,1][1000:].mean()
trace_lambda_2 = trace['lambda_'][:,0][1000:].mean()
p1 = (1 - trace['p1'][1000:]).mean()
print('A count expected value estimation result%.2f' % trace_lambda_1)
print('Group B expected value estimation result%.2f' % trace_lambda_2)
print('Percentage of group A%.3f' % p1)
A county expected value search result 3.01
Group B expected value search result 9.84
Percentage of county A 0.461
True value | Estimated result | |
---|---|---|
Group A expected value | 3 | 3.01 |
Group B expected value | 10 | 9.84 |
A county ratio | 0.473 | 0.46 |
have become.
Poisson distribution expected values are 3.01 and 9.84 The discrete probability distribution in which the sample is composed of a ratio of 46:54 is as follows.
#0 in the expected value of the estimation result.001~0.Number of successes with a probability of between 999 k
a_obs = np.arange(poisson.ppf(0.001, trace_lambda_1),
poisson.ppf(0.999, trace_lambda_1))
b_obs = np.arange(poisson.ppf(0.001, trace_lambda_2),
poisson.ppf(0.999, trace_lambda_2))
#Expected value 3,0 at 10.001~0.Number of successes with a probability of between 999 k
a_true = np.arange(poisson.ppf(0.001, 3),
poisson.ppf(0.999, 3))
b_true = np.arange(poisson.ppf(0.001, 10),
poisson.ppf(0.999, 10))
#Probability of success number k in estimation result
A_obs = pd.DataFrame({'lambda': a_obs,
'percent': poisson.pmf(a_obs, trace_lambda_1) * (46 / 100)})
B_obs = pd.DataFrame({'lambda': b_obs,
'percent': poisson.pmf(b_obs, trace_lambda_2) * (54 / 100)} )
#Expected value 3,Probability of success k at 10
A_true = pd.DataFrame({'lambda': a_true,
'percent': poisson.pmf(a_true, 3) * (9 / 19)})
B_true = pd.DataFrame({'lambda': b_true,
'percent': poisson.pmf(b_true, 10) * (10 / 19)} )
#Compare with randomly generated ones
plt.figure(figsize=(20, 6) )
plt.subplot(121)
sns.barplot(data=pd.merge(A_obs,B_obs,how='outer', on=['lambda','percent']).groupby('lambda', as_index=False)['percent'].sum(),
x='lambda', y='percent')
plt.title('Estimated result')
plt.ylabel('probability')
plt.xlabel('Number of successes k')
plt.xticks(np.int8(np.arange(0,21)),np.int8(np.arange(0,21)))
plt.subplot(122)
sns.barplot(data=pd.merge(A_true,B_true,how='outer', on=['lambda','percent']).groupby('lambda', as_index=False)['percent'].sum(),
x='lambda', y='percent')
plt.title('Expected value 3,Expected value 10 ratio 9:Probability distribution of 10')
plt.ylabel('probability')
plt.xlabel('Number of successes k')
plt.xticks(np.int8(np.arange(0,21)),np.int8(np.arange(0,21)))
plt.show()
We were able to estimate the expected value and ratio of the bimodal distribution.
Recommended Posts