I prepared a CmdStanPy environment with docker to read "Bayesian statistical modeling with Stan and R". There seemed to be few Japanese materials on building the environment for CmdStanPy, so I think it might be useful for someone, so I'll write it down.

Introduction

As you can see in the Official document, please be careful if you are using an environment where PyStan is already installed.

Note for PyStan users: PyStan and CmdStanPy should be installed in separate environments. If you already have PyStan installed, you should take care to install CmdStanPy in its own virtual environment.

I prepared the environment with docker because it seems to be troublesome due to other dependency problems [^ 1].

What is CmdStanPy?

Lightweight wrapper for CmdStan in Python

Official document: https://cmdstanpy.readthedocs.io/en/latest/index.html
github : https://github.com/stan-dev/cmdstanpy

Why CmdStanPy?

If you want to use Stan from Python, you will generally use PyStan [^ 2]. So the reason I decided to use CmdStanPy is that it runs faster (currently) than PyStan.

The details are as follows:

I also initially prepared a PyStan environment, but with a simple example called Eight schools (see this part of the PyStan document). It took more than a minute to compile my model.

After investigating whether the installation method of the author was bad or whether this was normal, [this exchange] in Stan's Community (https://discourse.mc-stan.org/t/why-is-it) -so-slow-for-stan-to-compile-model / 6004) was found. According to

Stan compilation speedup was introduced in Stan 2.20, but Rstan is only up to Stan 2.19.2.

right now we have lightweight wrappers CmdStanPy 11 (still in beta) and CmdStanR 18 (almost to beta release) which use the latest CmdStan (now at 2.21, 2.22 coming soon) - or any version of CmdStan that you’d like.

This exchange itself was two years ago, but when I checked the version of PyStan in my PyStan environment, it was 2.19.1.1, which seemed to be before the introduction of speedup [^ 3]. Also, even when writing Stackoverflow here, CmdStanPy made compilation more than 3 times faster. There was a description.

So I decided to try CmdStanPy.

environment

OS : macOS Mojave 10.14.6
docker : 19.03.12

Environment construction with Docker

`Dockerfile`


FROM ubuntu:20.04

RUN apt-get -y update
RUN apt-get -y install python3
RUN apt-get -y install python3-pip
RUN pip3 install --upgrade pip
RUN pip3 install cmdstanpy[all]
RUN python3 -c 'import cmdstanpy; cmdstanpy.install_cmdstan()'

WORKDIR /workdir

In the directory where this Dockerfile [^ 4] is located, build as follows (change YOUR_TAG to an appropriate TAG name),

docker image build -t YOUR_TAG .

Start (I am mounting the current directory to / workdir because I want to use the local source code)


docker container run -it --rm --net=host  \
		--mount type=bind,src=`pwd`,dst=/workdir \
		YOUR_TAG \
		bash

Run

In the directory where you started the docker container

stan file(8schools.stan[^5])
Data (8schools.data.json)
python script(run-cmdstanpy.py)

I will prepare.

`8schools.stan`


data {
    int<lower=0> J; // number of schools
    vector[J] y; // estimated treatment effects
    vector<lower=0>[J] sigma; // s.e. of effect estimates
}
parameters {
    real mu;
    real<lower=0> tau;
    vector[J] eta;
}
transformed parameters {
    vector[J] theta;
    theta = mu + tau * eta;
}
model {
    eta ~ normal(0, 1);
    y ~ normal(theta, sigma);
}

`JSON:8schools.data.json`


{
    "J": 8,
    "y": [28,  8, -3,  7, -1,  1, 18, 12],
    "sigma": [15, 10, 16, 11,  9, 11, 10, 18]
}

`run-cmdstanpy.py`


from cmdstanpy import CmdStanModel


print("Compile started")
model = CmdStanModel(stan_file="8schools.stan")
print("Compile finished")
print(model)
print("========================================")
data = "8schools.data.json"
print("Sampling started")
fit = model.sample(data=data)
print("Sampling finished")
print(fit)

If you execute python3 run-cmdstanpy.py in / workdir in the container started in the section "Environment construction with Docker" and get the following result, it is successful for the time being.

Compile started
INFO:cmdstanpy:compiling stan program, exe file: /workdir/8schools
INFO:cmdstanpy:compiler options: stanc_options=None, cpp_options=None
INFO:cmdstanpy:compiled model file: /workdir/8schools
Compile finished
CmdStanModel: name=8schools
	 stan_file=/workdir/8schools.stan
	 exe_file=/workdir/8schools
	 compiler_optons=stanc_options=None, cpp_options=None
========================================
Sampling started
INFO:cmdstanpy:start chain 1
INFO:cmdstanpy:start chain 2
INFO:cmdstanpy:start chain 3
INFO:cmdstanpy:start chain 4
INFO:cmdstanpy:finish chain 3
INFO:cmdstanpy:finish chain 1
INFO:cmdstanpy:finish chain 4
INFO:cmdstanpy:finish chain 2
Sampling finished
CmdStanMCMC: model=8schools chains=4['method=sample', 'algorithm=hmc', 'adapt', 'engaged=1']
 csv_files:
	/tmp/tmpd0ii4fo_/8schools-202009110125-1-0qknvwdc.csv
	/tmp/tmpd0ii4fo_/8schools-202009110125-2-qfoj4mom.csv
	/tmp/tmpd0ii4fo_/8schools-202009110125-3-cbn8wxph.csv
	/tmp/tmpd0ii4fo_/8schools-202009110125-4-h_5980t6.csv
 output_files:
	/tmp/tmpd0ii4fo_/8schools-202009110125-1-0qknvwdc-stdout.txt
	/tmp/tmpd0ii4fo_/8schools-202009110125-2-qfoj4mom-stdout.txt
	/tmp/tmpd0ii4fo_/8schools-202009110125-3-cbn8wxph-stdout.txt
	/tmp/tmpd0ii4fo_/8schools-202009110125-4-h_5980t6-stdout.txt
deleting tmpfiles dir: /tmp/tmpd0ii4fo_
done

For details on how to use CmdStanPy, see Official document.

appendix: Speed comparison with PyStan

I will also briefly record the speed comparison with PyStan. All stan files are common to 8schools.stan written in the" Execution "section.

CmdStanPy

The docker environment is as described in "Building an environment with Docker". I used the following python files:

`test-cmdstanpy.py`


from cmdstanpy import CmdStanModel
import time

print("Compile started")
t1 = time.time()
model = CmdStanModel(stan_file="8schools.stan")
t2 = time.time()

data = "8schools.data.json"
print("Sampling started")
t3 = time.time()
fit = model.sample(data=data)
t4 = time.time()

print("Compile time :", t2 - t1, "seconds")
print("Sampling time :", t4 - t3, "seconds")

PyStan

The Dockerfile used is as follows:

`Dockerfile`


FROM ubuntu:20.04

RUN apt-get -y update
RUN apt-get -y install python3
RUN apt-get -y install python3-pip
RUN pip3 install --upgrade pip
RUN pip3 install pystan

WORKDIR /workdir

Build and start in the same way as "Building an environment with Docker". Also, I used the following python files. ::

`test-pystan.py`


import pystan
import time

schools_dat = {'J': 8,
               'y': [28,  8, -3,  7, -1,  1, 18, 12],
               'sigma': [15, 10, 16, 11,  9, 11, 10, 18]}

print("Compile started")
t1 = time.time()
sm = pystan.StanModel(file='8schools.stan')
t2 = time.time()
print("Sampling started")
t3 = time.time()
fit = sm.sampling(data=schools_dat, iter=1000, chains=4)
t4 = time.time()

print("Compile time :", t2 - t1, "seconds")
print("Sampling time :", t4 - t3, "seconds")

result

CmdStanPy

Compile time : 12.916020393371582 seconds
Sampling time : 0.16101527214050293 seconds

PyStan

Compile time : 98.87509870529175 seconds
Sampling time : 0.45189523696899414 seconds

I measured it several times, but when it came to compiling the model, the result was that CmdStanPy was basically about 7 times faster than PyStan [^ 6] [^ 7].

[^ 1]: However, since I use docker in an atmosphere, the method of building the docker environment described here may not be the best. [^ 2]: A Google search for "Python Stan" first came up with PyStan's document and github page It was (: //github.com/stan-dev/pystan). [^ 3]: Is it the recognition that the PyStan version number corresponds to the Stan version number ...? [^ 4]: See https://cmdstanpy.readthedocs.io/en/latest/getting_started.html#installation for details on the part that installs CmdStanPy. [^ 5]: Downloaded from PyStan's getting started page. Https://pystan.readthedocs.io/en/latest/getting_started.html [^ 6]: In the case of CmdStanPy, there is a nice function that "Compare the modification time of exe file and stan file, and if the latter is newer than the former, it will not compile", so please be careful when measuring repeatedly .. [^ 7]: When I changed the base image of PyStan to ubuntu: 16.04, PyStan also took about 50 seconds, but I have not investigated the cause.

[PYTHON] Note: Prepare the environment of CmdStanPy with docker

Introduction

What is CmdStanPy?

Why CmdStanPy?

environment

Environment construction with Docker

Dockerfile

Run

8schools.stan

JSON:8schools.data.json

run-cmdstanpy.py

appendix: Speed comparison with PyStan

test-cmdstanpy.py

Dockerfile

test-pystan.py

result

`Dockerfile`

`8schools.stan`

`JSON:8schools.data.json`

`run-cmdstanpy.py`

`test-cmdstanpy.py`

`Dockerfile`

`test-pystan.py`