[PYTHON] [MCMC] Calculate WAIC with pystan

Introduction

Recently I started studying Bayesian statistics and decided to use Stan as an MCMC library, but all the textbooks I would like to combine with R are python related information. I posted it after a long time because there were few.

environment

Installation of pystan

According to the official documentation conda install pystan That's it, but in my case this is all when compiling the model undefined symbol: _ZTVNSt7__cxx1118basic_stringstreamIcSt11char_traitsIcESaIcEEE I got an error, so I installed gcc additionally. conda install gcc

Creating a model and data for testing

Borrowed from Hideki Toyoda "Practical Bayes Modeling"

import pystan
import numpy as np

model_code = '''
data{
    int<lower=0> N;
    vector[N] x;
    vector[N] y;
}
parameters{
    real b;
    real a;
    real<lower=0> sigma;
}
transformed parameters{
    real mu[N];
    for(i in 1:N){
        mu[i] = b + a*x[i];
    }
}
model{
    for(i in 1:N){
        y[i] ~ normal(mu[i], sigma);
    }
}
generated quantities{
    vector[N] log_lik;
    for(i in 1:N){
        log_lik[i] = normal_lpdf(y[i] | mu[i], sigma);
    }
}
'''

sm = pystan.StanModel(model_code=model_code)

x = [150.5,160.2,148,177.1,162,148.2,163.8,166.9,164.9,154.3,176.1,162.7,150.5,131.4,171.5,157.5,157.8,169.3,167.9,165.1,
169,167.4,158.9,134.1,165.4,157.3,156.1,140.4,152.3,163,174.3,156.8,162.7,157.4,141.5,153,153.3,157.3,171.2,167.2,
156,155,166.4,164.7,149.7,149.5,162.4,167.2,156.7,168.6,162.8,150.7,162.1,144.4,175.2,181.8,153.6,145.5,164.8,156.4,
186.8,157.5,166.3,158.3,149.1,160.3,136.3,175.6,159.8,184.1,163.7,149.5,165.3,146.8,143,161.5,152.7,158,158.9,150.9,
151.2,156.4,172.1,139.7,165.1,162,170.8,154.3,162.4,161.2,151.5,172.5,171.9,166.4,177,164.7,142.7,151.1,143.3,152.3]
y = [158.7,163.3,156.6,164.1,158.4,175.4,168,169.4,165.7,174.8,158.5,159.8,173,158.4,161.5,160.3,160.8,161,166.5,161.8,
159.5,172.4,161.5,161.7,162.3,168,162.5,162.7,158.2,160.7,163.4,158.9,166.7,152.4,165.1,152.2,160.9,159.3,158.4,162.6,
149.6,171.2,151.3,159.8,155.2,157.7,177.6,163.1,154,151.5,166.2,162.9,160.8,156.5,152.6,155.5,170,158.7,153.3,176.1,
166,161.3,170.4,169.2,158.7,178.4,161.2,153,162,164.5,179.2,163.7,166.2,162.5,160.7,162.8,168.5,177.5,170.2,171.5,
154.4,169.9,164.5,152.7,166.6,161.9,173.3,157.6,160,156.5,161.8,165.8,157.9,168.8,154.5,155.7,173.1,155.9,165.9,160.3]
data = {'N':100, 'x':x, 'y':y}

fit = sm.sampling(data=data, iter=2000, chains=4)

WAIC calculation

WAIC is an abbreviation for Widely Applicable information criterion (Watanabe, 2010) and is one of the information criteria used in the Bayesian model. The formula is

WAIC = -2lppd_{waic} + 2p_{waic} \\
lppd_{waic} \approx \sum_{i = 0}^{N}log \bigl\{ \frac{1}{T} \sum_{t=1}^{T}f(x_{i} | \theta^{(t)}) \bigr\} \\
p_{waic} = \sum_{i=1}^N V_i [log f(x_i | \theta^{(t)})]

Therefore, define the following function

def waic(fit):
    log_lik = fit.extract()['log_lik']
    lppd = np.log(np.exp(log_lik).mean(axis=0)).sum()
    p_waic = np.var(log_lik, axis=0).sum()
    waic = -2*lppd + 2*p_waic
    return round(waic, 3)

By using this

waic(fit)
>>>669.028

WAIC was calculated!

Recommended Posts

[MCMC] Calculate WAIC with pystan
Calculate tf-idf with scikit-learn
Calculate sample distribution with Scipy (discrete distribution)
Calculate domain CPU usage with libvirt
How to calculate date with python
[PyStan] Try Graphical Lasso with Stan.