[Python] Bayesian inference with Pyro

I recently learned about Pyro, which handles stochastic models, and thought it would be interesting, so I tried it as a trial. This article shares the source code. The source code is written in Jupyter Notebook. Please note that there is almost no theoretical explanation.

Environment Windows10 Python: 3.7.7 Jupyter Notebook: 1.0.0 PyTorch: 1.5.1 Pyro: 1.4.0 scipy: 1.5.2 numpy: 1.19.1 matplotlib: 3.3.0

We also publish the ipynb file on github. https://github.com/isuya0508/exercise_Pyro/blob/master/bayesian_inference_bernulli.ipynb

What is Pyro?

Pyro is a library that handles probabilistic models with Pytorch as the back end. You can install it from pip.

pip install pyro-ppl 

At this time, you need to install Pytorch in advance. Please refer to the official page for details.

Try it out

This time, we will consider estimating the parameter $ p $ from the data that follows the Bernoulli distribution $ Ber (p) $. First, import the required modules.

from collections import Counter

import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt

import torch
import torch.distributions.constraints as constraints
import pyro
import pyro.distributions as dist
from pyro.optim import SGD, Adam
from pyro.infer import SVI, Trace_ELBO

%matplotlib inline

pyro.set_rng_seed(0)
pyro.enable_validation(True)

Data generation

Create data with random numbers. At this time, the type must be a Pytorch tensor.

obs = stats.bernoulli.rvs(0.7, size=30, random_state=1)
obs = torch.tensor(obs, dtype=torch.float32)
obs
> tensor([1., 0., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 0., 1., 1., 1., 1.,
          1., 1., 0., 0., 1., 1., 0., 0., 1., 1., 1., 0.])

Check the number of 1s in the data.

Counter(obs.numpy())
> Counter({1.0: 23, 0.0: 7})

Therefore, the maximum likelihood estimator for the parameter $ p $ is $ 23/30 \ fallingdotseq 0.77 $. [^ 2] After that, we will Bayesian estimate $ p $.

Prior distribution settings

In Bayesian estimation, the prior distribution of parameters is assumed, and the posterior distribution is calculated by combining the observed data.

For the parameter $ p $ of the Bernoulli distribution $ Ber (p) $, it is common to assume a beta distribution as the prior distribution [^ 1]. In Pyro, the model method describes the prior distribution and the model of the data.

def model(data):
    #Assuming prior distribution
    prior = dist.Beta(1, 1)

    #Data modeling
    p = pyro.sample('p', prior)
    for i in range(len(data)):
        pyro.sample(f'obs{i}', dist.Bernoulli(p), obs=data[i])

This time we assume $ Beta (1, 1) $, which is in fact consistent with the uniform distribution.

Setting posterior distribution

Describe the posterior distribution with the guide method. The posterior distribution is also a beta distribution, similar to the prior distribution. At this time, give an appropriate initial value as a parameter of the posterior distribution.

def guide(data):
    #Definition of posterior distribution
    alpha_q = pyro.param('alpha_q', torch.tensor(15), constraint=constraints.positive)
    beta_q = pyro.param('beta_q', torch.tensor(15), constraint=constraints.positive)
    posterior = dist.Beta(alpha_q, beta_q)

    pyro.sample('p', posterior)

This posterior distribution parameter $ \ alpha, \ beta $ will be found.

Post-distributed fitting

As for the method of estimating the parameters of the posterior distribution, we will use stochastic variational estimation this time. It seems that Pyro basically uses this method. In this Bernoulli distribution example, the posterior distribution can be obtained analytically, so it is nonsense to use an approximation method such as variational estimation, but I will use this method for practice. (Originally, this is the method used when the distribution cannot be calculated analytically.)

For implementation, instantiate gradient descent (SGD) and stochastic variational estimation (SVI). When instantiating an SVI, give the model and guide defined above.

optimizer = SGD(dict(lr=0.0001, momentum=0.9))
svi = SVI(model, guide, optimizer, loss=Trace_ELBO())

All you have to do now is repeat svi.step (obs) to update the posterior distribution parameters.

NUM_STEPS = 2000
pyro.clear_param_store()

history = {
    'loss': [],
    'alpha': [],
    'beta': []
}

for step in range(1, NUM_STEPS + 1):
    loss = svi.step(obs)
    
    history['loss'].append(loss)
    history['alpha'].append(pyro.param('alpha_q').item())
    history['beta'].append(pyro.param('beta_q').item())
    
    if step % 100 == 0:
        print(f'STEP: {step} LOSS: {loss}')

> 
STEP: 100 LOSS: 17.461310371756554
STEP: 200 LOSS: 18.102468490600586
(Omission)
STEP: 1900 LOSS: 17.97793820500374
STEP: 2000 LOSS: 17.95139753818512

Here, the Loss being fitted and the posterior distribution parameters $ \ alpha and \ beta $ are recorded in history. The plot of these numbers for each step looks like this: (The source code is omitted.)

ダウンロード .png!

Check the finally obtained $ \ alpha, \ beta $, and check the expected value and variance of the posterior distribution.

infered_alpha = pyro.param('alpha_q').item()
infered_beta = pyro.param('beta_q').item()
posterior = stats.beta(infered_alpha_beta, infered_beta_beta)

print(f'alpha: {infered_alpha}')
print(f'beta: {infered_beta}')
print(f'Expected: {posterior.expect()}')
print(f'Variance: {posterior.var()}')
>
alpha: 25.764650344848633
beta: 7.556574821472168
Expected: 0.7732203787899605
Variance: 0.005109101547631603

Let's plot the prior and posterior distributions. (The source code is omitted.)

ダウンロード.png

It seems that it can be estimated well.

Summary

I tried a simple Bayesian inference using Pyro. This time it was a simple Bayesian estimation, but I think Pyro's strength is that it can describe complex models flexibly and simply. There are many examples of probabilistic methods in the official documentation, and just looking at them can be a learning experience.

[^ 2]: The maximum likelihood estimator of $ p $ matches the sample mean. [^ 1]: It is known that the posterior distribution also follows the beta distribution by making the prior distribution of $ p $ a beta distribution. Such a distribution is called a conjugate prior.

Recommended Posts

[Python] Bayesian inference with Pyro
Bayesian optimization very easy with Python
[Python] Mixed Gauss model with Pyro
FizzBuzz with Python3
Scraping with Python
Statistics with python
Scraping with Python
Python with Go
Twilio with Python
Integrate with Python
Play with 2016-Python
AES256 with python
Tested with Python
python starts with ()
with syntax (Python)
Zundokokiyoshi with python
Implemented in Python PRML Chapter 1 Bayesian Inference
Excel with Python
Microcomputer with Python
Cast with python
Serial communication with Python
Zip, unzip with python
Django 1.11 started with Python3.6
Primality test with Python
Python with eclipse + PyDev.
Socket communication with Python
Data analysis with python 2
Scraping with Python (preparation)
Learning Python with ChemTHEATER 03
Sequential search with Python
Run Python with VBA
Handling yaml with python
Solve AtCoder 167 with python
Serial communication with python
[Python] Use JSON with Python
Learning Python with ChemTHEATER 05-1
Learn Python with ChemTHEATER
Run prepDE.py with python3
1.1 Getting Started with Python
Collecting tweets with Python
Binarization with OpenCV / Python
3. 3. AI programming with Python
Kernel Method with Python
Non-blocking with Python + uWSGI
Scraping with Python + PhantomJS
Posting tweets with python
Use mecab with Python3
[Python] Redirect with CGIHTTPServer
Operate Kinesis with Python
Getting Started with Python
Use DynamoDB with Python
Zundko getter with python
Handle Excel with python
Ohm's Law with Python
Primality test with python
Run Blender with python
Solve Sudoku with Python
Python starting with Windows 7
Multi-process asynchronously with python
Python programming with Atom
Learning Python with ChemTHEATER 02