[PYTHON] [Pyro] Statistical modeling by the stochastic programming language Pyro ① ~ What is Pyro ~

Introduction

In this series, we will introduce how to perform statistical modeling using the stochastic programming language Pyro, with some examples. An example appears in the book "Introduction to Data Analysis by Bayesian Statistical Modeling Beginning with Practical Data Science Series R and Stan" (KS Information Science Specialized Book). I will adopt the example to be done. In this book, as the title suggests, Bayesian modeling is performed using R and Stan, but in this series, we would like to solve the same example using Pyro. I will write it with the aim of deepening my understanding of statistical modeling through coding and making it possible to apply flexible modeling using Pyro to actual data. I will leave the detailed explanation of statistical modeling itself to books and other articles, and in this article I will mainly write the implementation. As a reader. We assume the following people.

--A person who understands the basic usage of Python --A person who understands the basics of Bayesian statistical modeling --People who want to perform flexible and scalable statistical modeling --Pytorch user

As the first part of that series, this article will give you an overview of Pyro and the basic methods for implementing modeling from the next time onwards.

What is Pyro

** Pyro ** is one of the probabilistic programming languages written in Python developed by Uber AI Labs. Pytorch is used for the back end, and of course GPU can be used. The advantages of using Pyro are as follows, if you translate what is written on the Official Site.

--Versatility: Can represent any computable probability distribution. --Scaleability: Supports large datasets (using parallel computing). --Minimum: It's easy to understand the behavior because it reduces the complexity of the code as much as possible (?) --Flexibility: Processing can be automated or customized.

In particular, as a feature of GPU-enabled frameworks, the advantage (= scalability) that even large data sets can be calculated in a practical time is a noteworthy point. Other probabilistic programming languages that can use the GPU include Edward2, but Pytorch users will find Pyro easier to use.

Install Pyro

As you can see in the official reference, you can install it with the following command using pip.

pip install pyro-ppl

Pyro basic methods

Pyro provides the basic methods for implementing statistical modeling, but the most basic of them, pyro.sample and pyro.plate, are introduced in this article. These are methods that make it easy to describe sampling from a probability distribution.

1. Sampling according to probability distribution pyro.sample

Indispensable as a framework for statistical modeling is sampling according to a probability distribution. Many probability distributions (normal distribution, binomial distribution, Poisson distribution, ...) are prepared in Pyro. pyro.sample (<distinguishing name>, <probability distribution>) A sample can be extracted from the specified probability distribution by the description. The code below extracts 100,000 i.i.d samples from the standard normal distribution and visualizes the distribution of the samples extracted for confirmation.

import matplotlib.pyplot as plt
import pyro
import pyro.distributions as dist #dist implements the probability distributions available in Pyro

#Sampling with for statement
samples = []
for _ in range(100000):
    #Extract one sample from the standard normal distribution
    a_sample = pyro.sample("a_sample", dist.Normal(0, 1)) #From the standard normal distribution
    samples.append(a_sample)

#Visualization
plt.hist(samples, bins=50)
plt.title("Sampling distribution from standard normal distribution(n=Hundred thousand)", fontsize=16)

download.png You can see that it is properly sampled from the standard normal distribution. This is fine, but with Pyro you can get a similar sample with a simpler writing style.

2. Declared to get as a vector pyro.plate

Extracting a large number of independent samples from a particular probability distribution is very frequent in statistical modeling. Writing in a for statement like 1 is inefficient in terms of both description and processing. That's where pyro.plate comes in handy. pyro.plate (<context name>, <sample size>) Sampling with pyro.sample in the context declared by will give the sample as a vector. Below is the implementation.

# pyro.When acquiring as a vector using plate
with pyro.plate("plate", size=100000):
    samples = pyro.sample("samples", dist.Normal(0, 1))

plt.hist(samples, bins=50)
plt.title("Sampling distribution sampled from the standard normal distribution(n=Hundred thousand)", fontsize=16)

The result should be similar to the one above.

By using pyro.sample and pyro.plate, it is now possible to sample from the probability distribution. There are other indispensable functions for statistical modeling, but I will explain them from the next time onward with actual examples.

Summary

In this article, we mainly introduced the following two points. --Overview and benefits of Pyro --Sampling from probability distribution pyro.sample, pyro.plate As mentioned at the beginning, from the next time onwards, the book "Introduction to Data Analysis by Bayesian Statistical Modeling Beginning with Practical Data Science Series R and Stan" (KS Information Science) Based on the specialized book), we will proceed in the form of reproducing the example introduced in the book with Pyro. Next time, we will deal with Part 3, Chapter 2, "Simple Regression Model".

Recommended Posts

[Pyro] Statistical modeling by the stochastic programming language Pyro ① ~ What is Pyro ~
[Pyro] Stochastic modeling by the stochastic programming language Pyro ③ ~ Analysis of variance model, normal linear model ~
[Pyro] Statistical modeling by the stochastic programming language Pyro ① ~ What is Pyro ~
[Pyro] Stochastic modeling by the stochastic programming language Pyro ③ ~ Analysis of variance model, normal linear model ~
Try using a stochastic programming language (Pyro)
Evaluate the performance of a simple regression model using LeaveOneOut cross-validation
[Introduction to Python] What is the most powerful programming language now?
What kind of programming language is Python?
Try using a stochastic programming language (Pyro)
What is the activation function?
What is the Linux kernel?
Competitive programming is what (bonus)
What is the interface for ...
What is the Callback function?
Explain what is stochastic gradient descent by running it in Python
[Python] What is @? (About the decorator)
[python] What is the sorted key?
What is the X Window System?
What to do if the Pipenv environment is corrupted by updating Homebrew
The most sought after programming language in 2020
[Python] What is inherited by multiple inheritance?
What is the ETL processing framework clivoa?
[Unix] What is the zombie process / orphan process?
What is the cause of the following error?
[Golang] Go language basics What is Goroutine?
What about 2017 around the Crystal language? (Delusion)
Quadratic programming by the interior point method
[Machine learning] What is the LP norm?
What happens to the access log when the website is automatically accessed by selenium-webdriver