[PYTHON] Note: Prepare the environment of CmdStanPy with docker

I prepared a CmdStanPy environment with docker to read "Bayesian statistical modeling with Stan and R". There seemed to be few Japanese materials on building the environment for CmdStanPy, so I think it might be useful for someone, so I'll write it down.

Introduction

As you can see in the Official document, please be careful if you are using an environment where PyStan is already installed.

Note for PyStan users: PyStan and CmdStanPy should be installed in separate environments. If you already have PyStan installed, you should take care to install CmdStanPy in its own virtual environment.

I prepared the environment with docker because it seems to be troublesome due to other dependency problems [^ 1].

What is CmdStanPy?

Lightweight wrapper for CmdStan in Python

Why CmdStanPy?

If you want to use Stan from Python, you will generally use PyStan [^ 2]. So the reason I decided to use CmdStanPy is that it runs faster (currently) than PyStan.

The details are as follows:

I also initially prepared a PyStan environment, but with a simple example called Eight schools (see this part of the PyStan document). It took more than a minute to compile my model.

After investigating whether the installation method of the author was bad or whether this was normal, [this exchange] in Stan's Community (https://discourse.mc-stan.org/t/why-is-it) -so-slow-for-stan-to-compile-model / 6004) was found. According to

Stan compilation speedup was introduced in Stan 2.20, but Rstan is only up to Stan 2.19.2.

right now we have lightweight wrappers CmdStanPy 11 (still in beta) and CmdStanR 18 (almost to beta release) which use the latest CmdStan (now at 2.21, 2.22 coming soon) - or any version of CmdStan that you’d like.

This exchange itself was two years ago, but when I checked the version of PyStan in my PyStan environment, it was 2.19.1.1, which seemed to be before the introduction of speedup [^ 3]. Also, even when writing Stackoverflow here, CmdStanPy made compilation more than 3 times faster. There was a description.

So I decided to try CmdStanPy.

environment

Environment construction with Docker

Dockerfile


FROM ubuntu:20.04

RUN apt-get -y update
RUN apt-get -y install python3
RUN apt-get -y install python3-pip
RUN pip3 install --upgrade pip
RUN pip3 install cmdstanpy[all]
RUN python3 -c 'import cmdstanpy; cmdstanpy.install_cmdstan()'

WORKDIR /workdir

In the directory where this Dockerfile [^ 4] is located, build as follows (change YOUR_TAG to an appropriate TAG name),

docker image build -t YOUR_TAG .

Start (I am mounting the current directory to / workdir because I want to use the local source code)


docker container run -it --rm --net=host  \
		--mount type=bind,src=`pwd`,dst=/workdir \
		YOUR_TAG \
		bash

Run

In the directory where you started the docker container

I will prepare.

8schools.stan


data {
    int<lower=0> J; // number of schools
    vector[J] y; // estimated treatment effects
    vector<lower=0>[J] sigma; // s.e. of effect estimates
}
parameters {
    real mu;
    real<lower=0> tau;
    vector[J] eta;
}
transformed parameters {
    vector[J] theta;
    theta = mu + tau * eta;
}
model {
    eta ~ normal(0, 1);
    y ~ normal(theta, sigma);
}

JSON:8schools.data.json


{
    "J": 8,
    "y": [28,  8, -3,  7, -1,  1, 18, 12],
    "sigma": [15, 10, 16, 11,  9, 11, 10, 18]
}

run-cmdstanpy.py


from cmdstanpy import CmdStanModel


print("Compile started")
model = CmdStanModel(stan_file="8schools.stan")
print("Compile finished")
print(model)
print("========================================")
data = "8schools.data.json"
print("Sampling started")
fit = model.sample(data=data)
print("Sampling finished")
print(fit)

If you execute python3 run-cmdstanpy.py in / workdir in the container started in the section "Environment construction with Docker" and get the following result, it is successful for the time being.

Compile started
INFO:cmdstanpy:compiling stan program, exe file: /workdir/8schools
INFO:cmdstanpy:compiler options: stanc_options=None, cpp_options=None
INFO:cmdstanpy:compiled model file: /workdir/8schools
Compile finished
CmdStanModel: name=8schools
	 stan_file=/workdir/8schools.stan
	 exe_file=/workdir/8schools
	 compiler_optons=stanc_options=None, cpp_options=None
========================================
Sampling started
INFO:cmdstanpy:start chain 1
INFO:cmdstanpy:start chain 2
INFO:cmdstanpy:start chain 3
INFO:cmdstanpy:start chain 4
INFO:cmdstanpy:finish chain 3
INFO:cmdstanpy:finish chain 1
INFO:cmdstanpy:finish chain 4
INFO:cmdstanpy:finish chain 2
Sampling finished
CmdStanMCMC: model=8schools chains=4['method=sample', 'algorithm=hmc', 'adapt', 'engaged=1']
 csv_files:
	/tmp/tmpd0ii4fo_/8schools-202009110125-1-0qknvwdc.csv
	/tmp/tmpd0ii4fo_/8schools-202009110125-2-qfoj4mom.csv
	/tmp/tmpd0ii4fo_/8schools-202009110125-3-cbn8wxph.csv
	/tmp/tmpd0ii4fo_/8schools-202009110125-4-h_5980t6.csv
 output_files:
	/tmp/tmpd0ii4fo_/8schools-202009110125-1-0qknvwdc-stdout.txt
	/tmp/tmpd0ii4fo_/8schools-202009110125-2-qfoj4mom-stdout.txt
	/tmp/tmpd0ii4fo_/8schools-202009110125-3-cbn8wxph-stdout.txt
	/tmp/tmpd0ii4fo_/8schools-202009110125-4-h_5980t6-stdout.txt
deleting tmpfiles dir: /tmp/tmpd0ii4fo_
done

For details on how to use CmdStanPy, see Official document.

appendix: Speed comparison with PyStan

I will also briefly record the speed comparison with PyStan. All stan files are common to 8schools.stan written in the" Execution "section.

CmdStanPy

The docker environment is as described in "Building an environment with Docker". I used the following python files:

test-cmdstanpy.py


from cmdstanpy import CmdStanModel
import time

print("Compile started")
t1 = time.time()
model = CmdStanModel(stan_file="8schools.stan")
t2 = time.time()

data = "8schools.data.json"
print("Sampling started")
t3 = time.time()
fit = model.sample(data=data)
t4 = time.time()

print("Compile time :", t2 - t1, "seconds")
print("Sampling time :", t4 - t3, "seconds")

PyStan

The Dockerfile used is as follows:

Dockerfile


FROM ubuntu:20.04

RUN apt-get -y update
RUN apt-get -y install python3
RUN apt-get -y install python3-pip
RUN pip3 install --upgrade pip
RUN pip3 install pystan

WORKDIR /workdir

Build and start in the same way as "Building an environment with Docker". Also, I used the following python files. ::

test-pystan.py


import pystan
import time

schools_dat = {'J': 8,
               'y': [28,  8, -3,  7, -1,  1, 18, 12],
               'sigma': [15, 10, 16, 11,  9, 11, 10, 18]}

print("Compile started")
t1 = time.time()
sm = pystan.StanModel(file='8schools.stan')
t2 = time.time()
print("Sampling started")
t3 = time.time()
fit = sm.sampling(data=schools_dat, iter=1000, chains=4)
t4 = time.time()

print("Compile time :", t2 - t1, "seconds")
print("Sampling time :", t4 - t3, "seconds")

result

CmdStanPy

Compile time : 12.916020393371582 seconds
Sampling time : 0.16101527214050293 seconds

PyStan

Compile time : 98.87509870529175 seconds
Sampling time : 0.45189523696899414 seconds

I measured it several times, but when it came to compiling the model, the result was that CmdStanPy was basically about 7 times faster than PyStan [^ 6] [^ 7].

[^ 1]: However, since I use docker in an atmosphere, the method of building the docker environment described here may not be the best. [^ 2]: A Google search for "Python Stan" first came up with PyStan's document and github page It was (: //github.com/stan-dev/pystan). [^ 3]: Is it the recognition that the PyStan version number corresponds to the Stan version number ...? [^ 4]: See https://cmdstanpy.readthedocs.io/en/latest/getting_started.html#installation for details on the part that installs CmdStanPy. [^ 5]: Downloaded from PyStan's getting started page. Https://pystan.readthedocs.io/en/latest/getting_started.html [^ 6]: In the case of CmdStanPy, there is a nice function that "Compare the modification time of exe file and stan file, and if the latter is newer than the former, it will not compile", so please be careful when measuring repeatedly .. [^ 7]: When I changed the base image of PyStan to ubuntu: 16.04, PyStan also took about 50 seconds, but I have not investigated the cause.

Recommended Posts

Note: Prepare the environment of CmdStanPy with docker
Prepare the execution environment of Python3 with Docker
Prepare python3 environment with Docker
Preparing the execution environment of PyTorch with Docker November 2019
The day of docker run (note)
Prepare the development environment with anyenv
[Note] Export the html of the site with python.
Prepare the environment of Chainer on EC2 spot instance with AWS Lambda
The story of sharing the pyenv environment with multiple users
Example of pytest environment to fix database with Docker
Install Ubuntu 20.04 with GUI and prepare the development environment
Specify the project name of docker-compose with Docker integration of Pycharm
Unify the environment of the Python development team starting with Poetry
Get the host name of the host PC with Docker on Linux
Prepare pipenv environment with amazon Linux 2
Build Mysql + Python environment with docker
Build PyPy execution environment with Docker
Prepare Python development environment with Atom
Rebuild Django's development environment with Docker! !! !! !!
Data science environment construction with Docker
Prepare the environment for Atom for Pythonista
(Note) Notes on building TensorFlow + Flask + Nginx environment with Docker Compose
Postgres environment construction with Docker I struggled a little, so note
Prepare a distributed load test environment with the Python load test tool Locust
Periodically log the value of Omron environment sensor with Raspberry Pi
Access the Docker Remote API with Requests
I checked the contents of docker volume
Run the IDCF cloud CLI with Docker
Build Jupyter Lab (Python) environment with Docker
Align the size of the colorbar with matplotlib
(Note) Notes on building TensorFlow environment with Docker Compose (only one container)
Behind the flyer: Using Docker with Python
Note when creating an environment with python
Check the existence of the file with python
About the virtual environment of python version 3.7
Get a local DynamoDB environment with Docker
Build the execution environment of Jupyter Lab
Create Python + uWSGI + Nginx environment with Docker
I set the environment variable with Docker and displayed it in Python
Prepare an environment to touch grib2 format files with python (Docker edition)
Try to prepare each environment of kivy
The third night of the loop with for
[Linux] Build a jenkins environment with Docker
Prepare the development environment for keyhac for Mac
Launch environment with LineBot + Heroku + Docker + Python
The second night of the loop with for
Logging the value of Omron environment sensor with Raspberry Pi (USB type)
Python Note: The secret role of commas
Count the number of characters with echo
Build NGINX + NGINX Unit + MySQL environment with Docker
[Linux] Build a Docker environment with Amazon Linux 2
Prepare the environment for O'Reilly's book "Deep Learning from scratch" with apt-get (Debian 8)
I installed Pygame with Python 3.5.1 in the environment of pyenv on OS X
Note: How to get the last day of the month with python (added the first day of the month)
Note that the calculation of average pairwise correlation was very easy with pandas
The story of doing deep learning with TPU
Hello World with gRPC / go in Docker environment
Introduction to Python Let's prepare the development environment
Build Django + NGINX + PostgreSQL development environment with Docker
Analytical environment construction with Docker (jupyter notebook + PostgreSQL)
Build the fastest Django development environment with docker-compose