[PYTHON] [Statistics] Visualization for understanding generalized linear mixed models (GLMM).

** "Introduction to Statistical Modeling for Data Analysis" ** (commonly known as Midoribon) on p157, the idea of "mixing distributions" is simulated based on random numbers instead of thinking based on distributions and visualized with animation. I tried it, so I would like to introduce it.

Here is the resulting animation. I will explain this content in the text. mix_poisson_norm.gif (The code is here)

All the detailed explanations are written in this "Midoribon" in an easy-to-understand manner, so here we will only explain how to visualize it. If you think it sounds interesting, please buy it!

Preface

A maximum of 8 seeds are produced in a plant, but the number of surviving seeds is binomial distribution.

p(y_i) ={8 \choose y_i}\ q_i^{y_i} (1-q_i)^{8-y_i} \quad \mbox{for}\ q_i=0,1,2,\dots,8

Suppose you are following. $ y_i $ is the number of living seeds (observed value) of individual $ i $, and $ q_i $ is the probability of survival per seed in individual $ i $.

Also assume that this $ q_i $ has individual differences that differ from individual to individual and is represented by a logistic function.

q_i = {\rm logistic} (r_i)= {1 \over 1 + \exp( -r_i) }

We also assume that the argument $ r_i $ of this logistic function follows a normal distribution with mean 0 and standard deviation $ s $.

r_i \sim N(0, s)

That is, the density function of $ r_i $ is

p(r_i | s) = {1 \over \sqrt{2\pi s^2} } \exp \left( -{r_i^2 \over 2s^2} \right)

And this is regarded as the individual difference of individual $ i $.

Let's visualize it.

Now, first, the individual difference $ r_i \ sim N (0, s) $ is illustrated. Generated 10000 random numbers that follow a normal distribution when the standard deviation is $ s = 4 $. The histogram is the light blue bar below. Logistic function where each of these is represented by a red line

q_i = {\rm logistic} (r_i)= {1 \over 1 + \exp( -r_i) }

Converts to a value from 0 to 1, that is, a value that can be regarded as a probability, that is, $ q_i $.

norm_logistic-compressor.png

Since each of these normal random numbers could be converted to the probability $ q_i $, the following is a histogram of $ q_i $. It is a relatively wide normal distribution, and you can see that it is closer to 0 and 1 due to the effect of converting it with a logistic function. hist_of_q-compressor.png

So, for each of these 10000 $ q_i $, the binomial distribution is made to correspond. The binomial distribution is

p(y_i) ={8 \choose y_i}\ q_i^{y_i} (1-q_i)^{8-y_i} \quad \mbox{for}\ q_i=0,1,2,\dots,8

So, the shape is different for each $ q_i $. Here is an example of 9 values where $ q_i $ is [0.0, 0.125, 0.25, 0.375, 0.5, 0.625, 0.75, 0.875, 1.0].

various_binomial-compressor.png

Generate a large number of binomial distributions as above according to the value of 10000 $ q_i $ and add them together. Here is the result of the addition.

sum_of_binom-compressor.png

Try to animate

In the previous section, I tried to graph the example of $ s = 4 $, but I also posted the animation of the state when $ s $ moves continuously from 0 to 3 in the graph below. Become. In this example, the distribution seems to shift to the left and right from around $ s = 2 $.

mix_poisson_norm.gif

Reference book

An introduction to statistical modeling for data analysis    http://hosho.ees.hokudai.ac.jp/%7Ekubo/ce/IwanamiBook.html

Recommended Posts

[Statistics] Visualization for understanding generalized linear mixed models (GLMM).
Introduction to Generalized Linear Models (GLM) with Python