Bayesian modeling practice 1st
I didn't have much time due to various reasons, but I was finally able to practice Bayesian modeling.
Part 1 is about the difference between the two groups of Bayesian statistics
Reference books
"First Statistical Data Analysis" (https://www.asakura.co.jp/books/isbn/978-4-254-12214-5/)
is.
The data used is the famous "iris"
Furthermore, I used "pedal.length" (the length of the calyx?).
I use stan + python (pystan) for Bayesian modeling software.
The purpose of this time is to perform the t-test, which is famous as a frequency-based statistic, by Bayesian modeling.
Assuming that the observation data are x1 and x2, these are
x1 ~ normal(mu1, sigma1)
x2 ~ normal(mu2, sigma2)
It is generated like this.
By comparing the population means, mu1 and mu2, estimated from the observed values (x1, x2),
It becomes possible to discuss the probability that there is a difference between them.
First, for the original data, the plot below shows the normal distribution with mu and sigma estimated as histograms.
50 data each for versicolor and virginica species,
Certainly, just by plotting, you can see that there seems to be a difference between them.
Next estimated mu_versicolor, mu_virginica plot
There are two types of colors, dark and light, because we estimated from 10 samples (lighter one) and 30 samples (darker one).
From the fact that the thinner ones are widely distributed, it can be seen that the more observations there are, the more accurate the estimation can be made.
Finally mu_versicolor --plot of 10 sample version and 30 sample version of mu_viginica
10 samples for blue and 30 samples for green
By summarizing this histogram (obtaining EAP)
It is possible to evaluate the probability that the size of the calyx is larger than ~ ~ cm.
After all, the larger the number of samples, the higher the kurtosis of the histogram.
In the case of 30 samples, it can be evaluated that there is a difference of 1.0 to 1.5 with most probability.
This time, I evaluated the difference between the two values using Bayesian statistics, but it seems to be quite usable.
It is good that not only the p-value but also the probability of how far apart can be obtained as ~~%.
At the same time, it would be nice to be able to evaluate the low reliability of the small amount of data.
I want to use it for my own master's thesis ~ ~ ~
Recommended Posts