In this series, we will introduce how to perform statistical modeling using the stochastic programming language Pyro, with some examples. An example appears in the book "Introduction to Data Analysis by Bayesian Statistical Modeling Beginning with Practical Data Science Series R and Stan" (KS Information Science Specialized Book). I will adopt the example to be done. In this book, as the title suggests, Bayesian modeling is performed using R and Stan, but in this series, we would like to solve the same example using Pyro. I will write it with the aim of deepening my understanding of statistical modeling through coding and making it possible to apply flexible modeling using Pyro to actual data. I will leave the detailed explanation of statistical modeling itself to books and other articles, and in this article I will mainly write the implementation. As a reader. We assume the following people.
--A person who understands the basic usage of Python --A person who understands the basics of Bayesian statistical modeling --People who want to perform flexible and scalable statistical modeling --Pytorch user
As the first part of that series, this article will give you an overview of Pyro and the basic methods for implementing modeling from the next time onwards.
** Pyro ** is one of the probabilistic programming languages written in Python developed by Uber AI Labs. Pytorch is used for the back end, and of course GPU can be used. The advantages of using Pyro are as follows, if you translate what is written on the Official Site.
--Versatility: Can represent any computable probability distribution. --Scaleability: Supports large datasets (using parallel computing). --Minimum: It's easy to understand the behavior because it reduces the complexity of the code as much as possible (?) --Flexibility: Processing can be automated or customized.
In particular, as a feature of GPU-enabled frameworks, the advantage (= scalability) that even large data sets can be calculated in a practical time is a noteworthy point. Other probabilistic programming languages that can use the GPU include Edward2, but Pytorch users will find Pyro easier to use.
As you can see in the official reference, you can install it with the following command using pip.
pip install pyro-ppl
Pyro provides the basic methods for implementing statistical modeling, but the most basic of them, pyro.sample
and pyro.plate
, are introduced in this article. These are methods that make it easy to describe sampling from a probability distribution.
pyro.sample
Indispensable as a framework for statistical modeling is sampling according to a probability distribution. Many probability distributions (normal distribution, binomial distribution, Poisson distribution, ...) are prepared in Pyro.
pyro.sample (<distinguishing name>, <probability distribution>)
A sample can be extracted from the specified probability distribution by the description.
The code below extracts 100,000 i.i.d samples from the standard normal distribution and visualizes the distribution of the samples extracted for confirmation.
import matplotlib.pyplot as plt
import pyro
import pyro.distributions as dist #dist implements the probability distributions available in Pyro
#Sampling with for statement
samples = []
for _ in range(100000):
#Extract one sample from the standard normal distribution
a_sample = pyro.sample("a_sample", dist.Normal(0, 1)) #From the standard normal distribution
samples.append(a_sample)
#Visualization
plt.hist(samples, bins=50)
plt.title("Sampling distribution from standard normal distribution(n=Hundred thousand)", fontsize=16)
You can see that it is properly sampled from the standard normal distribution. This is fine, but with Pyro you can get a similar sample with a simpler writing style.
pyro.plate
Extracting a large number of independent samples from a particular probability distribution is very frequent in statistical modeling. Writing in a for statement like 1
is inefficient in terms of both description and processing.
That's where pyro.plate
comes in handy.
pyro.plate (<context name>, <sample size>)
Sampling with pyro.sample in the context declared by will give the sample as a vector. Below is the implementation.
# pyro.When acquiring as a vector using plate
with pyro.plate("plate", size=100000):
samples = pyro.sample("samples", dist.Normal(0, 1))
plt.hist(samples, bins=50)
plt.title("Sampling distribution sampled from the standard normal distribution(n=Hundred thousand)", fontsize=16)
The result should be similar to the one above.
By using pyro.sample
and pyro.plate
, it is now possible to sample from the probability distribution. There are other indispensable functions for statistical modeling, but I will explain them from the next time onward with actual examples.
In this article, we mainly introduced the following two points.
--Overview and benefits of Pyro
--Sampling from probability distribution pyro.sample
, pyro.plate
As mentioned at the beginning, from the next time onwards, the book "Introduction to Data Analysis by Bayesian Statistical Modeling Beginning with Practical Data Science Series R and Stan" (KS Information Science) Based on the specialized book), we will proceed in the form of reproducing the example introduced in the book with Pyro.
Next time, we will deal with Part 3, Chapter 2, "Simple Regression Model".
Recommended Posts