Introduction to Bayesian Modeling Using pymc3 Bayesian-Modeling-in-Python Japanese Translation (Chapter 0-2)

There may be mistakes, so if you find one, please point it out.

This time, I would like to introduce the idea of Bayes, which is an inevitable idea in machine learning. The explanation is described based on the following article.

reference: https://github.com/markdregan/Bayesian-Modelling-in-Python

Knowledge gained in this article

1: Bayesian way of thinking 2: Experience Bayesian theory using ipython notebook 3: Evaluation method of the model optimized using Bayesian theory

Why you need Bayes in the first place

The purpose of machine learning is to learn patterns from data and deal with unknown data, but perfect data is

"Complete, consistent, correct and descriptive"

Such cases are extremely rare, so it is close to the idea of giving knowledge in advance so that it is not easily affected by strange data.

I think that human beings also make predictions when they do something and act to improve their feedback.

However, if the prediction is wrong in advance, bad results are waiting, so be careful there.

For example, I want to find a meat restaurant for women (based on dogmatism and prejudice)

150912426027-thumb-815xauto-19288.jpg

  • Women in their early twenties: Take them to inexperienced fashionable stores or stores that offer things that you wouldn't normally experience.
  • Women in their late 20s: Fashionable stores are exhausted to some extent, so it is difficult to simply look for fashionable stores. It would be nice to have a hideaway shop or a favorite shop

In this way, even if you receive a woman, the prior distribution differs depending on the age, so if you make a mistake in this prior distribution, it will be ridiculous.

tutorial

This time, instead of following the mathematical formulas, the tutorial for understanding Bayesian ideas using actual data was in English, so I will translate it into Japanese as part of my study and leave a memorandum of practice.

Since Chapter 3 and later are advanced contents, if you suppress to Chapter 2, you can do from model creation to evaluation.

If you are interested in the main story, please see below

https://github.com/markdregan/Bayesian-Modelling-in-Python

Data acquisition: Section 0. Introduction.ipynb

This tutorial is getting data for your own Google Hangouts. It takes time to acquire the data, so it is recommended to acquire the data while doing other work.

If you don't have a python environment, you can download the necessary libraries by using requirement.txt below.

numpy==1.9.2
ipython==4.0.0
notebook==4.0.4
jinja2==2.8
pyzmq==14.7.0
tornado==4.1
matplotlib
simplejson
pandas
seaborn
datetime
scipy
patsy
statsmodels
git+https://github.com/pymc-devs/pymc3

I tried it in an OSX environment, but I got the following error.

RuntimeError**: Python is not installed as a framework. The Mac OS X backend will not be able to function correctly if Python is not installed as a framework. See the Python documentation for more information on installing Python as a framework on Mac OS X. Please either reinstall Python as a framework, or try one of the other backends.

The corresponding method is as follows.

http://stackoverflow.com/questions/21784641/installation-issue-with-matplotlib-python

The nesting of the obtained json file is amazing, so here is some brief information about the data fields.

Field Description Example
conversation_id Conversation id representing the chat thread Ugw5Xrm3ZO5mzAfKB7V4AaABAQ
participants List of participants in the chat thread [Mark, Peter, John]
event_id Id representing an event such as chat message or video hangout 7-H0Z7-FkyB7-H0au2avdw
timestamp Timestamp 2014-08-15 01:54:12
message Content of the message sent Went to the local wedding photographer today
sender Sender of the message Mark Regan

The tutorial parses the json data and converts it to the Pandas data framework for each message.

Please note that the code below narrows the data to the author and Alison Darcy, so you will not get the data unless you comment it out.

messages = messages[(messages['sender'] == 'Mark Regan') & (messages['participants_str'] != 'Alison Darcy, Mark Regan')]

The problem you want to solve

1: Response time affects who you are talking to 2: What are the factors that affect the response time? 3: What is the worst day?

In my case, I don't use Google Hangout so much, so I got the following result.

Screen Shot 2015-11-27 at 10.59.17.png

This time, I can't use it for the problem I want to solve, so I decided to use the tutorial data as a reference.

Predicting model parameters: Section 1: Estimating model parameters

Here is the main part of the Bayesian tutorial. It's interesting that the results are easy to understand using diagrams.

First, I describe the Bayesian way of thinking. As an example:

The boy counts the number of cars passing in front of his house each day and writes them in his notebook. His notebook describes the number of cars passed.

In Bayesian view, the observed data occur randomly, but I think it occurs with some probability distribution.

For discrete cases like the example, consider using the Poisson distribution.

The example shows cases with an average of 5,20,40.

Screen Shot 2015-11-27 at 11.27.26.png

Green has a probability distribution with an average of 5, orange has a probability distribution with an average of 20, and pink has a probability distribution with an average of 40.

By applying the time of the previous response to the framework of Poisson distribution and predicting the parameters with Bayes, we are trying to solve the question that arose this time.

Screen Shot 2015-11-27 at 11.30.38.png

The mean value of the Poisson distribution is estimated by maximum likelihood estimation (logarithm). The likelihood of this estimation and the mean value of the Poisson distribution to be estimated can be confirmed below.

Screen Shot 2015-11-27 at 12.52.24.png

It can be confirmed that the average value is close to 20 and the likelihood is the highest. The Poisson distribution of response time is as follows, and it is most often returned in 18 seconds.

Screen Shot 2015-11-27 at 12.55.32.png

The information that is known in advance to apply Bayes in this example is that the data falls within the range of 10 to 60. The subject is to define the Poisson distribution for it and obtain it by maximum likelihood estimation.

Screen Shot 2015-11-27 at 12.57.43.png

MCMC

This is a technique that changes the average value of this time into data and repeats it until the value maximizes the likelihood. The good point of this method is that even if there is no data, the parameters to be estimated from the prior distribution are decided and the values are estimated at random, and it is possible to stop when the likelihood is maximized.

However, there are disadvantages that it is difficult to converge when there are many parameters to be estimated, and it is difficult to exert the effect when the prior distribution is not appropriate.

Screen Shot 2015-11-27 at 13.13.56.png

If you run it with ipython notebook, you can see the process of estimating parameters while generating data, so please try it and see how it works.

This is the result actually estimated by MCMC. The data occurs between 17 and 19, and the mean is just over 18, so it's as accurate as a simple Bayesian inference.

Screen Shot 2015-11-27 at 13.20.00.png

The trajectory of the likelihood maximized by MCMC can be confirmed below. Since the estimated mean does not always converge as expected, you can check this trace to see what kind of transition it is.

Screen Shot 2015-11-27 at 13.39.47.png

It is also necessary to understand the correlation coefficient between the value output in the sample and the value output so far.

Model checking: Section 2: Model checking

There are two points to check

1: Does the model represent data? 2: Comparison of models

Let's check the data and the predicted distribution. The mode of the distribution and the frequent response time do not match. It turns out that this model is not suitable in this case.

Screen Shot 2015-11-27 at 14.23.26.png

Therefore, if we use a negative binomial distribution, which is relatively similar to the Poisson distribution, we can handle not only the mean but also the variance, so let's replace it.

The distribution is similar as shown below.

Screen Shot 2015-11-27 at 14.18.08.png

The negative binomial distribution estimates the α and μ parameters as follows.

Screen Shot 2015-11-27 at 14.25.39.png

The predicted values are as follows, α is in the value of 1.4 to 2.4, and the expressive power of the distribution is improved by the role of variance.

Screen Shot 2015-11-27 at 14.29.36.png

Below is a diagram of the distribution and response time created using the α and μ parameters estimated earlier. The distribution is similar to the response time distribution and is more characteristic.

Screen Shot 2015-11-27 at 14.34.35.png

A combination of Poisson and negative binomial distributions has also been proposed.

Screen Shot 2015-11-27 at 14.46.22.png

It states that it calculates the Bayes factor and decides which model to use according to the criteria below.

Screen Shot 2015-11-27 at 14.48.19.png

This time, we have done the basics up to Chapter 2, but since Chapter 3 will be an advanced version, please challenge.

https://github.com/markdregan/Bayesian-Modelling-in-Python/blob/master/Section%203.%20Hierarchical%20modelling.ipynb

reference

Pakutaso https://www.pakutaso.com/

Learn statistical modeling with Stan (2): What was MCMC in the first place? http://tjo.hatenablog.com/entry/2014/02/08/173324

Knowledge to pretend to know Bayesian inference http://www.anlyznews.com/2012/01/blog-post_31.html

Bayesian-Modelling-in-Python https://github.com/markdregan/Bayesian-Modelling-in-Python

Next article

tksarah has confirmed the operation of DRBD Cinder Volume Driver with OpenStack LIBERTY !!

Recommended Posts

Introduction to Bayesian Modeling Using pymc3 Bayesian-Modeling-in-Python Japanese Translation (Chapter 0-2)
[PyTorch] Introduction to Japanese document classification using BERT
[Technical book] Introduction to data analysis using Python -1 Chapter Introduction-
Introduction to discord.py (3) Using voice
An introduction to Bayesian optimization
Introduction to Deep Learning for the first time (Chainer) Japanese character recognition Chapter 3 [Character recognition using a model]
Introduction to Bayesian Statistical Modeling with python ~ Trying Linear Regression with MCMC ~
[Introduction to Python3 Day 13] Chapter 7 Strings (7.1-7.1.1.1)
[Introduction to Python3 Day 14] Chapter 7 Strings (7.1.1.1 to 7.1.1.4)
[Introduction to Python3 Day 15] Chapter 7 Strings (7.1.2-7.1.2.2)
How to use PyMC3 (Bayesian inference)
[Introduction to Python3 Day 21] Chapter 10 System (10.1 to 10.5)
Introduction to Discrete Event Simulation Using Python # 1
[Introduction to Python3, Day 17] Chapter 8 Data Destinations (8.1-8.2.5)
[Introduction to Python3, Day 17] Chapter 8 Data Destinations (8.3-8.3.6.1)
[PyTorch] Introduction to document classification using BERT
[Introduction to Python3 Day 19] Chapter 8 Data Destinations (8.4-8.5)
[Introduction to Python3 Day 18] Chapter 8 Data Destinations (8.3.6.2 to 8.3.6.3)
Getting started: 30 seconds to Keras Japanese translation
Introduction to Discrete Event Simulation Using Python # 2
Biopython Tutorial and Cookbook Japanese translation (Chapter 1, 2)
Introduction to Effectiveness Verification Chapter 1 in Python
Introduction to Tornado (3): Development using templates [Practice]