[PYTHON] Bayesian estimation of each expected value from a bimodal distribution

Introduction

Using Python's Bayesian inference library PyMC3, from a bimodal discrete probability distribution to the expected Poisson distribution $ \ lambda_1, \ lambda_2 $, and to each peak Estimate the ratio to which it belongs.

Operating environment & library used

Operating environment


#OS
import platform
print(platform.platform())
#Windows-10-10.0.15063-SP0

#python version
print(platform.python_version())
#version 3.7.7

#Library
import numpy as np #version 1.19.1
import pandas as pd #version 1.1.0
from scipy.stats import poisson #version 1.5.0
import matplotlib.pyplot as plt #version 3.2.2
import seaborn as sns #version 0.10.1
import pymc3 as pm #version 3.8
import theano.tensor as T #1.0.4

Try

Assuming that the number of Bernoulli trials is the same, consider a discrete probability distribution in which counties with expected success times $ \ lambda $ of 3 and 10 are mixed.

#Randomly generate the number of successes when the expected value 3 is 900 and the expected value 10 is 1000 times.
obs = np.hstack((poisson.rvs(3,size=900), poisson.rvs(10,size=1000)))

sns.countplot(obs)

Output result 出力結果.png (I want to estimate the expected value of the two peaks of this $ \ lambda $ and the ratio)

Consider the model of PyMC3

When the county with the expected value of 3 is A and the county with the expected value of 10 is B, Since it is not possible to know in advance which county the observed value $ n_x $ belongs to, The probability $ p (A) $ that the observed value $ n_x $ is A follows a uniform distribution from 0 to 1. Then, the probability $ p (B) $ that the observed value $ n_x $ is B is $ 1-p (A) $.

Furthermore, the expected values of A and B are estimated assuming that the true expected values of A and B are ** unknown **. Assuming that the prior distribution follows an exponential distribution, the mean value of the observed values is applied to $ \ lambda $ of the expected value $ \ frac {1} {\ lambda} $ of the exponential distribution. At this time, I think that there are two peaks. The posterior distribution is estimated using the value obtained from the prior distribution as the expected value mu and the observed value as the value randomly generated earlier.

with pm.Model() as model:
    p1 = pm.Uniform('p1',0,1) #p(A)Probability of
    p2 = 1 - p1 #p(B)Probability of
    p = T.stack([p1, p2])
    
    #p1=0, p2=Prepare as many categorical variables as 1 for the number of samples.
    assignment = pm.Categorical('assignment', p, shape=obs.shape[0], testval = np.random.randint(0, 2, obs.shape[0]))

    #Expected value of prior distribution= 1 /Set as an exponential distribution with the average of observed values
    lambda_ = pm.Exponential('lambda_', 1./obs.mean(), shape=2)
    lambda_i = pm.Deterministic('lambda_i', lambda_[assignment])
    #mu to lambda_Set the generated value to observed
    observations = pm.Poisson('obs', mu=lambda_i, observed=obs)

    #search
    step1 = pm.Metropolis(vars=[p, lambda_]) #Discrete values set to Metropolis
    step2 = pm.ElemwiseCategorical(vars=[assignment])
    trace = pm.sample(5000, step=[step1, step2])

#Plot the results
plt.figure(figsize=(12, 8))

plt.subplot(211)
plt.plot(trace['lambda_'][:,1], label='Group A', c='r')
plt.plot(trace['lambda_'][:,0], label='Group B', c='c')
plt.ylim(2, 11)
plt.xlabel('Number of searches')
plt.ylabel('Expected value')
plt.title('Estimated result')
leg = plt.legend(loc='upper right')
leg.get_frame().set_alpha(0.7)

plt.subplot(212)
plt.plot(1 - trace['p1'], label='Percentage of samples selected for County A', c='r')
plt.legend(loc='upper right')
plt.xlabel('Number of searches')
plt.ylabel('Percentage(%)')

plt.show()plt.figure(figsize=(12, 8))

#Outputs the average value of searches after the 1000th time
trace_lambda_1 = trace['lambda_'][:,1][1000:].mean()
trace_lambda_2 = trace['lambda_'][:,0][1000:].mean()
p1 = (1 - trace['p1'][1000:]).mean()

print('A count expected value estimation result%.2f' % trace_lambda_1)
print('Group B expected value estimation result%.2f' % trace_lambda_2)
print('Percentage of group A%.3f' % p1)
探索結果 (1).png
A county expected value search result 3.01
Group B expected value search result 9.84
Percentage of county A 0.461

result

True value Estimated result
Group A expected value 3 3.01
Group B expected value 10 9.84
A county ratio 0.473 0.46

have become.

Poisson distribution expected values are 3.01 and 9.84 The discrete probability distribution in which the sample is composed of a ratio of 46:54 is as follows.

#0 in the expected value of the estimation result.001~0.Number of successes with a probability of between 999 k
a_obs = np.arange(poisson.ppf(0.001, trace_lambda_1),
             poisson.ppf(0.999, trace_lambda_1))
b_obs = np.arange(poisson.ppf(0.001, trace_lambda_2),
             poisson.ppf(0.999, trace_lambda_2))
#Expected value 3,0 at 10.001~0.Number of successes with a probability of between 999 k
a_true = np.arange(poisson.ppf(0.001, 3),
             poisson.ppf(0.999, 3))
b_true = np.arange(poisson.ppf(0.001, 10),
             poisson.ppf(0.999, 10))
#Probability of success number k in estimation result
A_obs = pd.DataFrame({'lambda': a_obs,
                 'percent': poisson.pmf(a_obs, trace_lambda_1) * (46 / 100)})
B_obs = pd.DataFrame({'lambda': b_obs,
                'percent': poisson.pmf(b_obs, trace_lambda_2) * (54 / 100)} )
#Expected value 3,Probability of success k at 10
A_true = pd.DataFrame({'lambda': a_true,
                 'percent': poisson.pmf(a_true, 3) * (9 / 19)})
B_true = pd.DataFrame({'lambda': b_true,
                'percent': poisson.pmf(b_true, 10) * (10 / 19)} )

#Compare with randomly generated ones
plt.figure(figsize=(20, 6) )
plt.subplot(121)
sns.barplot(data=pd.merge(A_obs,B_obs,how='outer', on=['lambda','percent']).groupby('lambda', as_index=False)['percent'].sum(),
            x='lambda', y='percent')
plt.title('Estimated result')
plt.ylabel('probability')
plt.xlabel('Number of successes k')
plt.xticks(np.int8(np.arange(0,21)),np.int8(np.arange(0,21)))
plt.subplot(122)
sns.barplot(data=pd.merge(A_true,B_true,how='outer', on=['lambda','percent']).groupby('lambda', as_index=False)['percent'].sum(),
            x='lambda', y='percent')
plt.title('Expected value 3,Expected value 10 ratio 9:Probability distribution of 10')
plt.ylabel('probability')
plt.xlabel('Number of successes k')
plt.xticks(np.int8(np.arange(0,21)),np.int8(np.arange(0,21)))
plt.show()
比較 (1).png

We were able to estimate the expected value and ratio of the bimodal distribution.

Recommended Posts

Bayesian estimation of each expected value from a bimodal distribution
EM of mixed Gaussian distribution
Concept of Bayesian reasoning (2) ... Bayesian estimation and probability distribution
Bayesian estimation of each expected value from a bimodal distribution
Estimating mixed Gaussian distribution by EM algorithm
PRML Chapter 10 Variational Gaussian Distribution Python Implementation
EM algorithm calculation for mixed Gaussian distribution problem
Python: Diagram of 2D data distribution (kernel density estimation)
Variational Bayesian inference implementation of topic model in python
Verification of normal distribution
[Basics of modern mathematical statistics with python] Chapter 2: Probability distribution and expected value