There may be mistakes, so if you find one, please point it out.

This time, I would like to introduce the idea of Bayes, which is an inevitable idea in machine learning. The explanation is described based on the following article.

reference: https://github.com/markdregan/Bayesian-Modelling-in-Python

Knowledge gained in this article

1: Bayesian way of thinking 2: Experience Bayesian theory using ipython notebook 3: Evaluation method of the model optimized using Bayesian theory

Why you need Bayes in the first place

The purpose of machine learning is to learn patterns from data and deal with unknown data, but perfect data is

"Complete, consistent, correct and descriptive"

Such cases are extremely rare, so it is close to the idea of giving knowledge in advance so that it is not easily affected by strange data.

I think that human beings also make predictions when they do something and act to improve their feedback.

However, if the prediction is wrong in advance, bad results are waiting, so be careful there.

For example, I want to find a meat restaurant for women (based on dogmatism and prejudice)

Women in their early twenties: Take them to inexperienced fashionable stores or stores that offer things that you wouldn't normally experience.

Women in their late 20s: Fashionable stores are exhausted to some extent, so it is difficult to simply look for fashionable stores. It would be nice to have a hideaway shop or a favorite shop

In this way, even if you receive a woman, the prior distribution differs depending on the age, so if you make a mistake in this prior distribution, it will be ridiculous.

tutorial

This time, instead of following the mathematical formulas, the tutorial for understanding Bayesian ideas using actual data was in English, so I will translate it into Japanese as part of my study and leave a memorandum of practice.

Since Chapter 3 and later are advanced contents, if you suppress to Chapter 2, you can do from model creation to evaluation.

If you are interested in the main story, please see below

https://github.com/markdregan/Bayesian-Modelling-in-Python

Data acquisition: Section 0. Introduction.ipynb

This tutorial is getting data for your own Google Hangouts. It takes time to acquire the data, so it is recommended to acquire the data while doing other work.

If you don't have a python environment, you can download the necessary libraries by using requirement.txt below.

numpy==1.9.2
ipython==4.0.0
notebook==4.0.4
jinja2==2.8
pyzmq==14.7.0
tornado==4.1
matplotlib
simplejson
pandas
seaborn
datetime
scipy
patsy
statsmodels
git+https://github.com/pymc-devs/pymc3

I tried it in an OSX environment, but I got the following error.

RuntimeError**: Python is not installed as a framework. The Mac OS X backend will not be able to function correctly if Python is not installed as a framework. See the Python documentation for more information on installing Python as a framework on Mac OS X. Please either reinstall Python as a framework, or try one of the other backends.

The corresponding method is as follows.

http://stackoverflow.com/questions/21784641/installation-issue-with-matplotlib-python

The nesting of the obtained json file is amazing, so here is some brief information about the data fields.

Field	Description	Example
conversation_id	Conversation id representing the chat thread	Ugw5Xrm3ZO5mzAfKB7V4AaABAQ
participants	List of participants in the chat thread	[Mark, Peter, John]
event_id	Id representing an event such as chat message or video hangout	7-H0Z7-FkyB7-H0au2avdw
timestamp	Timestamp	2014-08-15 01:54:12
message	Content of the message sent	Went to the local wedding photographer today
sender	Sender of the message	Mark Regan

The tutorial parses the json data and converts it to the Pandas data framework for each message.

Please note that the code below narrows the data to the author and Alison Darcy, so you will not get the data unless you comment it out.

messages = messages[(messages['sender'] == 'Mark Regan') & (messages['participants_str'] != 'Alison Darcy, Mark Regan')]

The problem you want to solve

1: Response time affects who you are talking to 2: What are the factors that affect the response time? 3: What is the worst day?

In my case, I don't use Google Hangout so much, so I got the following result.

This time, I can't use it for the problem I want to solve, so I decided to use the tutorial data as a reference.

Note: Export data for usage throughout tutorial will output your own data, so if you want to use the tutorial data, you should not run this process.

Predicting model parameters: Section 1: Estimating model parameters

Here is the main part of the Bayesian tutorial. It's interesting that the results are easy to understand using diagrams.

First, I describe the Bayesian way of thinking. As an example:

The boy counts the number of cars passing in front of his house each day and writes them in his notebook. His notebook describes the number of cars passed.

In Bayesian view, the observed data occur randomly, but I think it occurs with some probability distribution.

For discrete cases like the example, consider using the Poisson distribution.

The example shows cases with an average of 5,20,40.

Green has a probability distribution with an average of 5, orange has a probability distribution with an average of 20, and pink has a probability distribution with an average of 40.

By applying the time of the previous response to the framework of Poisson distribution and predicting the parameters with Bayes, we are trying to solve the question that arose this time.

The mean value of the Poisson distribution is estimated by maximum likelihood estimation (logarithm). The likelihood of this estimation and the mean value of the Poisson distribution to be estimated can be confirmed below.

It can be confirmed that the average value is close to 20 and the likelihood is the highest. The Poisson distribution of response time is as follows, and it is most often returned in 18 seconds.

The information that is known in advance to apply Bayes in this example is that the data falls within the range of 10 to 60. The subject is to define the Poisson distribution for it and obtain it by maximum likelihood estimation.

MCMC

This is a technique that changes the average value of this time into data and repeats it until the value maximizes the likelihood. The good point of this method is that even if there is no data, the parameters to be estimated from the prior distribution are decided and the values are estimated at random, and it is possible to stop when the likelihood is maximized.

However, there are disadvantages that it is difficult to converge when there are many parameters to be estimated, and it is difficult to exert the effect when the prior distribution is not appropriate.

If you run it with ipython notebook, you can see the process of estimating parameters while generating data, so please try it and see how it works.

This is the result actually estimated by MCMC. The data occurs between 17 and 19, and the mean is just over 18, so it's as accurate as a simple Bayesian inference.

The trajectory of the likelihood maximized by MCMC can be confirmed below. Since the estimated mean does not always converge as expected, you can check this trace to see what kind of transition it is.

It is also necessary to understand the correlation coefficient between the value output in the sample and the value output so far.

Model checking: Section 2: Model checking

There are two points to check

1: Does the model represent data? 2: Comparison of models

Let's check the data and the predicted distribution. The mode of the distribution and the frequent response time do not match. It turns out that this model is not suitable in this case.

Therefore, if we use a negative binomial distribution, which is relatively similar to the Poisson distribution, we can handle not only the mean but also the variance, so let's replace it.

The distribution is similar as shown below.

The negative binomial distribution estimates the α and μ parameters as follows.

The predicted values are as follows, α is in the value of 1.4 to 2.4, and the expressive power of the distribution is improved by the role of variance.

Below is a diagram of the distribution and response time created using the α and μ parameters estimated earlier. The distribution is similar to the response time distribution and is more characteristic.

A combination of Poisson and negative binomial distributions has also been proposed.

It states that it calculates the Bayes factor and decides which model to use according to the criteria below.

This time, we have done the basics up to Chapter 2, but since Chapter 3 will be an advanced version, please challenge.

https://github.com/markdregan/Bayesian-Modelling-in-Python/blob/master/Section%203.%20Hierarchical%20modelling.ipynb

reference

Pakutaso https://www.pakutaso.com/

Learn statistical modeling with Stan (2): What was MCMC in the first place? http://tjo.hatenablog.com/entry/2014/02/08/173324

Knowledge to pretend to know Bayesian inference http://www.anlyznews.com/2012/01/blog-post_31.html

Bayesian-Modelling-in-Python https://github.com/markdregan/Bayesian-Modelling-in-Python

tksarah has confirmed the operation of DRBD Cinder Volume Driver with OpenStack LIBERTY !!

Introduction to Bayesian Modeling Using pymc3 Bayesian-Modeling-in-Python Japanese Translation (Chapter 0-2)