I prepared a CmdStanPy environment with docker to read "Bayesian statistical modeling with Stan and R". There seemed to be few Japanese materials on building the environment for CmdStanPy, so I think it might be useful for someone, so I'll write it down.
As you can see in the Official document, please be careful if you are using an environment where PyStan is already installed.
Note for PyStan users: PyStan and CmdStanPy should be installed in separate environments. If you already have PyStan installed, you should take care to install CmdStanPy in its own virtual environment.
I prepared the environment with docker because it seems to be troublesome due to other dependency problems [^ 1].
Lightweight wrapper for CmdStan in Python
If you want to use Stan from Python, you will generally use PyStan [^ 2]. So the reason I decided to use CmdStanPy is that it runs faster (currently) than PyStan.
The details are as follows:
I also initially prepared a PyStan environment, but with a simple example called Eight schools (see this part of the PyStan document). It took more than a minute to compile my model.
After investigating whether the installation method of the author was bad or whether this was normal, [this exchange] in Stan's Community (https://discourse.mc-stan.org/t/why-is-it) -so-slow-for-stan-to-compile-model / 6004) was found. According to
Stan compilation speedup was introduced in Stan 2.20, but Rstan is only up to Stan 2.19.2.
right now we have lightweight wrappers CmdStanPy 11 (still in beta) and CmdStanR 18 (almost to beta release) which use the latest CmdStan (now at 2.21, 2.22 coming soon) - or any version of CmdStan that you’d like.
This exchange itself was two years ago, but when I checked the version of PyStan in my PyStan environment, it was 2.19.1.1, which seemed to be before the introduction of speedup [^ 3]. Also, even when writing Stackoverflow here, CmdStanPy made compilation more than 3 times faster. There was a description.
So I decided to try CmdStanPy.
Dockerfile
FROM ubuntu:20.04
RUN apt-get -y update
RUN apt-get -y install python3
RUN apt-get -y install python3-pip
RUN pip3 install --upgrade pip
RUN pip3 install cmdstanpy[all]
RUN python3 -c 'import cmdstanpy; cmdstanpy.install_cmdstan()'
WORKDIR /workdir
In the directory where this Dockerfile [^ 4] is located, build as follows (change YOUR_TAG to an appropriate TAG name),
docker image build -t YOUR_TAG .
Start (I am mounting the current directory to / workdir
because I want to use the local source code)
docker container run -it --rm --net=host \
--mount type=bind,src=`pwd`,dst=/workdir \
YOUR_TAG \
bash
In the directory where you started the docker container
8schools.stan
[^5])8schools.data.json
)run-cmdstanpy.py
)I will prepare.
8schools.stan
data {
int<lower=0> J; // number of schools
vector[J] y; // estimated treatment effects
vector<lower=0>[J] sigma; // s.e. of effect estimates
}
parameters {
real mu;
real<lower=0> tau;
vector[J] eta;
}
transformed parameters {
vector[J] theta;
theta = mu + tau * eta;
}
model {
eta ~ normal(0, 1);
y ~ normal(theta, sigma);
}
JSON:8schools.data.json
{
"J": 8,
"y": [28, 8, -3, 7, -1, 1, 18, 12],
"sigma": [15, 10, 16, 11, 9, 11, 10, 18]
}
run-cmdstanpy.py
from cmdstanpy import CmdStanModel
print("Compile started")
model = CmdStanModel(stan_file="8schools.stan")
print("Compile finished")
print(model)
print("========================================")
data = "8schools.data.json"
print("Sampling started")
fit = model.sample(data=data)
print("Sampling finished")
print(fit)
If you execute python3 run-cmdstanpy.py
in / workdir
in the container started in the section "Environment construction with Docker" and get the following result, it is successful for the time being.
Compile started
INFO:cmdstanpy:compiling stan program, exe file: /workdir/8schools
INFO:cmdstanpy:compiler options: stanc_options=None, cpp_options=None
INFO:cmdstanpy:compiled model file: /workdir/8schools
Compile finished
CmdStanModel: name=8schools
stan_file=/workdir/8schools.stan
exe_file=/workdir/8schools
compiler_optons=stanc_options=None, cpp_options=None
========================================
Sampling started
INFO:cmdstanpy:start chain 1
INFO:cmdstanpy:start chain 2
INFO:cmdstanpy:start chain 3
INFO:cmdstanpy:start chain 4
INFO:cmdstanpy:finish chain 3
INFO:cmdstanpy:finish chain 1
INFO:cmdstanpy:finish chain 4
INFO:cmdstanpy:finish chain 2
Sampling finished
CmdStanMCMC: model=8schools chains=4['method=sample', 'algorithm=hmc', 'adapt', 'engaged=1']
csv_files:
/tmp/tmpd0ii4fo_/8schools-202009110125-1-0qknvwdc.csv
/tmp/tmpd0ii4fo_/8schools-202009110125-2-qfoj4mom.csv
/tmp/tmpd0ii4fo_/8schools-202009110125-3-cbn8wxph.csv
/tmp/tmpd0ii4fo_/8schools-202009110125-4-h_5980t6.csv
output_files:
/tmp/tmpd0ii4fo_/8schools-202009110125-1-0qknvwdc-stdout.txt
/tmp/tmpd0ii4fo_/8schools-202009110125-2-qfoj4mom-stdout.txt
/tmp/tmpd0ii4fo_/8schools-202009110125-3-cbn8wxph-stdout.txt
/tmp/tmpd0ii4fo_/8schools-202009110125-4-h_5980t6-stdout.txt
deleting tmpfiles dir: /tmp/tmpd0ii4fo_
done
For details on how to use CmdStanPy, see Official document.
I will also briefly record the speed comparison with PyStan. All stan files are common to 8schools.stan
written in the" Execution "section.
CmdStanPy
The docker environment is as described in "Building an environment with Docker". I used the following python files:
test-cmdstanpy.py
from cmdstanpy import CmdStanModel
import time
print("Compile started")
t1 = time.time()
model = CmdStanModel(stan_file="8schools.stan")
t2 = time.time()
data = "8schools.data.json"
print("Sampling started")
t3 = time.time()
fit = model.sample(data=data)
t4 = time.time()
print("Compile time :", t2 - t1, "seconds")
print("Sampling time :", t4 - t3, "seconds")
PyStan
The Dockerfile used is as follows:
Dockerfile
FROM ubuntu:20.04
RUN apt-get -y update
RUN apt-get -y install python3
RUN apt-get -y install python3-pip
RUN pip3 install --upgrade pip
RUN pip3 install pystan
WORKDIR /workdir
Build and start in the same way as "Building an environment with Docker". Also, I used the following python files. ::
test-pystan.py
import pystan
import time
schools_dat = {'J': 8,
'y': [28, 8, -3, 7, -1, 1, 18, 12],
'sigma': [15, 10, 16, 11, 9, 11, 10, 18]}
print("Compile started")
t1 = time.time()
sm = pystan.StanModel(file='8schools.stan')
t2 = time.time()
print("Sampling started")
t3 = time.time()
fit = sm.sampling(data=schools_dat, iter=1000, chains=4)
t4 = time.time()
print("Compile time :", t2 - t1, "seconds")
print("Sampling time :", t4 - t3, "seconds")
CmdStanPy
Compile time : 12.916020393371582 seconds
Sampling time : 0.16101527214050293 seconds
PyStan
Compile time : 98.87509870529175 seconds
Sampling time : 0.45189523696899414 seconds
I measured it several times, but when it came to compiling the model, the result was that CmdStanPy was basically about 7 times faster than PyStan [^ 6] [^ 7].
[^ 1]: However, since I use docker in an atmosphere, the method of building the docker environment described here may not be the best. [^ 2]: A Google search for "Python Stan" first came up with PyStan's document and github page It was (: //github.com/stan-dev/pystan). [^ 3]: Is it the recognition that the PyStan version number corresponds to the Stan version number ...? [^ 4]: See https://cmdstanpy.readthedocs.io/en/latest/getting_started.html#installation for details on the part that installs CmdStanPy. [^ 5]: Downloaded from PyStan's getting started page. Https://pystan.readthedocs.io/en/latest/getting_started.html [^ 6]: In the case of CmdStanPy, there is a nice function that "Compare the modification time of exe file and stan file, and if the latter is newer than the former, it will not compile", so please be careful when measuring repeatedly .. [^ 7]: When I changed the base image of PyStan to ubuntu: 16.04, PyStan also took about 50 seconds, but I have not investigated the cause.
Recommended Posts