[PYTHON] I tried running pymc

environment

mac python3.5

Installation

pip install pymc

Try to move part 1

mymodel.py


# Import relevant modules
import pymc
import numpy as np

# Some data
n = 5 * np.ones(4, dtype=int)
x = np.array([-.86, -.3, -.05, .73])

# Priors on unknown parameters
alpha = pymc.Normal('alpha', mu=0, tau=.01)
beta = pymc.Normal('beta', mu=0, tau=.01)

# Arbitrary deterministic function of parameters
@pymc.deterministic
def theta(a=alpha, b=beta):
    """theta = logit^{-1}(a+b)"""
    return pymc.invlogit(a + b * x)

# Binomial likelihood for data
d = pymc.Binomial('d', n=n, p=theta, value=np.array([0., 1., 3., 5.]),
                  observed=True)

main.py


import pymc
import mymodel

S = pymc.MCMC(mymodel, db='pickle')
S.sample(iter=10000, burn=5000, thin=2)
pymc.Matplot.plot(S)

python main.py
[-----------------65%----              ] 6510 of 10000 complete in 0.5 sec
[-----------------100%-----------------] 10000 of 10000 complete in 0.8 sec
Plotting beta
Plotting alpha
Plotting theta_0
Plotting theta_1
Plotting theta_2
Plotting theta_3

スクリーンショット 2017-04-18 7.40.42.png スクリーンショット 2017-04-18 7.40.47.png スクリーンショット 2017-04-18 7.40.52.png

Try to move part 2

pip install triangle-plot
pip install corner
jupyter notebook

Start by creating fake data. Use a linear regression model with two predictors (x, y) and one outcome variable (z).

python


from numpy import *
Nobs = 20
x_true = random.uniform(0,10, size=Nobs)
y_true = random.uniform(-1,1, size=Nobs)
alpha_true = 0.5
beta_x_true = 1.0
beta_y_true = 10.0
eps_true = 0.5
z_true = alpha_true + beta_x_true*x_true + beta_y_true*y_true
z_obs = z_true + random.normal(0, eps_true, size=Nobs)

python


%matplotlib inline
from matplotlib import pyplot as plt
plt.figure(figsize=(12,6))
plt.subplot(1,2,1)
plt.scatter(x_true, z_obs, c=y_true, marker='o')
plt.colorbar()
plt.xlabel('X')
plt.ylabel('Z')
plt.subplot(1,2,2)
plt.scatter(y_true, z_obs, c=x_true, marker='o')
plt.colorbar()
plt.xlabel('Y')
plt.ylabel('Z')

スクリーンショット 2017-04-18 8.23.31.png

python


import pymc
# define the parameters with their associated priors
alpha = pymc.Uniform('alpha', -100,100, value=median(z_obs))
betax = pymc.Uniform('betax', -100,100, value=std(z_obs)/std(x_true))
betay = pymc.Uniform('betay', -100,100, value=std(z_obs)/std(y_true))
eps = pymc.Uniform('eps', 0, 100, value=0.01)

# Now define the model
@pymc.deterministic
def model(alpha=alpha, betax=betax, betay=betay, x=x_true, y=y_true):
    return alpha + betax*x + betay*y

# pymc parametrizes the width of the normal distribution by tau=1/sigma**2
@pymc.deterministic
def tau(eps=eps):
    return power(eps, -2)

# Lastly relate the model/parameters to the data
data = pymc.Normal('data', mu=model, tau=tau, value=z_obs, observed=True)

Once the definition is in place, send a pymc object to pymc.MCMC to create and run a sampler. Here we are adapting the Metropolis-Hastings sampler using use_step_method. This is a smarter sampler than the default Metropolis-Hastings. Build a covariance matrix using the previous sample to sample the parameter space more efficiently.

python


sampler = pymc.MCMC([alpha,betax,betay,eps,model,tau,z_obs,x_true,y_true])
sampler.use_step_method(pymc.AdaptiveMetropolis, [alpha,betax,betay,eps],
                        scales={alpha:0.1, betax:0.1, betay:1.0, eps:0.1})
sampler.sample(iter=10000)

[100%**] 10000 of 10000 complete

Plot.

python


pymc.Matplot.plot(sampler)

I will paste only the alpha. For each probability, pymc plots traces (upper left panel), autocorrelation (lower left pane), and sample histograms (right panel). You can see that nothing happens at first. After that, the adaptive MH sampler will find a better covariance matrix. スクリーンショット 2017-04-18 8.30.54.png

Next, let's get the median value of each variable from the sample. The MCMC sample examines the trace for each variable.

python


m_alpha = median(alpha.trace())
m_betax = median(betax.trace())
m_betay = median(betay.trace())
m_eps = median(eps.trace())

The median is more robust with respect to the asymmetric distribution (eg, the distribution of eps). This time we'll modify the y predictor when reversing the plot and x. That way you can see if you have found the right trend. You can also plot shaded areas that represent intrinsic variability.

python


plt.figure(figsize=(12,6))
plt.subplot(1,2,1)
plt.plot(x_true, z_obs-m_alpha-m_betay*y_true, 'o')
plt.xlabel('X')
plt.ylabel('Z - alpha - beta_y y')
# Now plot the model
xx = array([x_true.min(), x_true.max()])
plt.plot(xx, xx*m_betax)
plt.plot(xx, xx*m_betax + m_eps, '--', color='k')
plt.plot(xx, xx*m_betax - m_eps, '--', color='k')
plt.subplot(1,2,2)
plt.plot(y_true, z_obs-m_alpha-m_betax*x_true, 'o')
plt.xlabel('Y')
plt.ylabel('Z - alpha - beta_x x')
yy = array([y_true.min(), y_true.max()])
plt.plot(yy, yy*m_betay)
plt.plot(yy, yy*m_betay + m_eps, '--', color='k')
plt.plot(yy, yy*m_betay - m_eps, '--', color='k')

スクリーンショット 2017-04-18 8.36.00.png

Examine the covariance of various parameters in the triangle_plot plot package. To do this, it seems that a Markov chain of two-dimensional arrays indexed by [i, p] is needed. All elements are connected to samples and transposed.

python


samples = array([alpha.trace(),betax.trace(),betay.trace(),eps.trace()]).T
print((samples.shape))
import triangle
triangle.corner(samples[:,:], labels=['alpha','betax','betay','eps'], truths=[alpha_true, beta_x_true, beta_y_true, eps_true])

スクリーンショット 2017-04-18 8.39.33.png

Example of use

https://github.com/QuantScientist/kaggle-seizure-prediction-challenge-2016/blob/master/jupyter/bayesian-logistic-regression/ieegPymc3Version1.ipynb

reference

https://users.obs.carnegiescience.edu/cburns/ipynbs/PyMC.html https://pymc-devs.github.io/pymc/INSTALL.html#dependencies

Recommended Posts

I tried running pymc
I tried running TensorFlow
I tried running GAN in Colaboratory
I tried Grumpy (Go running Python).
I tried running prolog with python 3.8.2.
I tried scraping
I tried PyQ
I tried AutoKeras
I tried papermill
I tried django-slack
I tried Django
I tried spleeter
I tried cgo
I tried using argparse
I tried using anytree
I tried competitive programming
I tried ARP spoofing
I tried using aiomysql
I tried using Summpy
I tried Python> autopep8
I tried running YOLO v3 on Google Colab
I tried using coturn
I tried using Pipenv
I tried using matplotlib
I tried using "Anvil".
I tried using Hubot
I tried using ESPCN
I tried PyCaret2.0 (pycaret-nightly)
I tried using openpyxl
I tried deep learning
I tried AWS CDK!
I tried using Ipython
I tried to debug.
I tried using PyCaret
I tried using cron
I tried Kivy's mapview
I tried running faiss with python, Go, Rust
I tried using ngrok
I tried using face_recognition
I tried to paste
I tried using Jupyter
I tried running python -m summpy.server -h 127.0.0.1 -p 8080
I tried using PyCaret
I tried moving EfficientDet
I tried shell programming
I tried running Deep Floor Plan with Python 3.6.10.
I tried using Heapq
I tried running alembic, a Python migration tool
I tried using doctest
I tried Python> decorator
I tried Auto Gluon
I tried using folium
I tried using jinja2
I tried AWS Iot
I tried Bayesian optimization!
I tried using folium
I tried using time-window
I tried running the app on the IoT platform "Rimotte"
I tried running BERT with Sakura VPS (without GPU)
I tried running python etc. from a bat file
I tried running TensorFlow in AWS Lambda environment: Preparation