I recently learned about Pyro, which handles stochastic models, and thought it would be interesting, so I tried it as a trial. This article shares the source code. The source code is written in Jupyter Notebook. Please note that there is almost no theoretical explanation.
We also publish the ipynb file on github. https://github.com/isuya0508/exercise_Pyro/blob/master/bayesian_inference_bernulli.ipynb
Pyro is a library that handles probabilistic models with Pytorch as the back end. You can install it from pip.
pip install pyro-ppl
At this time, you need to install Pytorch in advance. Please refer to the official page for details.
This time, we will consider estimating the parameter $ p $ from the data that follows the Bernoulli distribution $ Ber (p) $. First, import the required modules.
from collections import Counter
import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt
import torch
import torch.distributions.constraints as constraints
import pyro
import pyro.distributions as dist
from pyro.optim import SGD, Adam
from pyro.infer import SVI, Trace_ELBO
%matplotlib inline
pyro.set_rng_seed(0)
pyro.enable_validation(True)
Create data with random numbers. At this time, the type must be a Pytorch tensor.
obs = stats.bernoulli.rvs(0.7, size=30, random_state=1)
obs = torch.tensor(obs, dtype=torch.float32)
obs
> tensor([1., 0., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 0., 1., 1., 1., 1.,
1., 1., 0., 0., 1., 1., 0., 0., 1., 1., 1., 0.])
Check the number of 1s in the data.
Counter(obs.numpy())
> Counter({1.0: 23, 0.0: 7})
Therefore, the maximum likelihood estimator for the parameter $ p $ is $ 23/30 \ fallingdotseq 0.77 $. [^ 2] After that, we will Bayesian estimate $ p $.
In Bayesian estimation, the prior distribution of parameters is assumed, and the posterior distribution is calculated by combining the observed data.
For the parameter $ p $ of the Bernoulli distribution $ Ber (p) $, it is common to assume a beta distribution as the prior distribution [^ 1].
In Pyro, the model
method describes the prior distribution and the model of the data.
def model(data):
#Assuming prior distribution
prior = dist.Beta(1, 1)
#Data modeling
p = pyro.sample('p', prior)
for i in range(len(data)):
pyro.sample(f'obs{i}', dist.Bernoulli(p), obs=data[i])
This time we assume $ Beta (1, 1) $, which is in fact consistent with the uniform distribution.
Describe the posterior distribution with the guide
method. The posterior distribution is also a beta distribution, similar to the prior distribution.
At this time, give an appropriate initial value as a parameter of the posterior distribution.
def guide(data):
#Definition of posterior distribution
alpha_q = pyro.param('alpha_q', torch.tensor(15), constraint=constraints.positive)
beta_q = pyro.param('beta_q', torch.tensor(15), constraint=constraints.positive)
posterior = dist.Beta(alpha_q, beta_q)
pyro.sample('p', posterior)
This posterior distribution parameter $ \ alpha, \ beta $ will be found.
As for the method of estimating the parameters of the posterior distribution, we will use stochastic variational estimation this time. It seems that Pyro basically uses this method. In this Bernoulli distribution example, the posterior distribution can be obtained analytically, so it is nonsense to use an approximation method such as variational estimation, but I will use this method for practice. (Originally, this is the method used when the distribution cannot be calculated analytically.)
For implementation, instantiate gradient descent (SGD) and stochastic variational estimation (SVI).
When instantiating an SVI, give the model
and guide
defined above.
optimizer = SGD(dict(lr=0.0001, momentum=0.9))
svi = SVI(model, guide, optimizer, loss=Trace_ELBO())
All you have to do now is repeat svi.step (obs)
to update the posterior distribution parameters.
NUM_STEPS = 2000
pyro.clear_param_store()
history = {
'loss': [],
'alpha': [],
'beta': []
}
for step in range(1, NUM_STEPS + 1):
loss = svi.step(obs)
history['loss'].append(loss)
history['alpha'].append(pyro.param('alpha_q').item())
history['beta'].append(pyro.param('beta_q').item())
if step % 100 == 0:
print(f'STEP: {step} LOSS: {loss}')
>
STEP: 100 LOSS: 17.461310371756554
STEP: 200 LOSS: 18.102468490600586
(Omission)
STEP: 1900 LOSS: 17.97793820500374
STEP: 2000 LOSS: 17.95139753818512
Here, the Loss being fitted and the posterior distribution parameters $ \ alpha and \ beta $ are recorded in history
.
The plot of these numbers for each step looks like this: (The source code is omitted.)
!
Check the finally obtained $ \ alpha, \ beta $, and check the expected value and variance of the posterior distribution.
infered_alpha = pyro.param('alpha_q').item()
infered_beta = pyro.param('beta_q').item()
posterior = stats.beta(infered_alpha_beta, infered_beta_beta)
print(f'alpha: {infered_alpha}')
print(f'beta: {infered_beta}')
print(f'Expected: {posterior.expect()}')
print(f'Variance: {posterior.var()}')
>
alpha: 25.764650344848633
beta: 7.556574821472168
Expected: 0.7732203787899605
Variance: 0.005109101547631603
Let's plot the prior and posterior distributions. (The source code is omitted.)
It seems that it can be estimated well.
I tried a simple Bayesian inference using Pyro. This time it was a simple Bayesian estimation, but I think Pyro's strength is that it can describe complex models flexibly and simply. There are many examples of probabilistic methods in the official documentation, and just looking at them can be a learning experience.
[^ 2]: The maximum likelihood estimator of $ p $ matches the sample mean. [^ 1]: It is known that the posterior distribution also follows the beta distribution by making the prior distribution of $ p $ a beta distribution. Such a distribution is called a conjugate prior.
Recommended Posts