"Statistics is the strongest study" and "[Statistics is the strongest study](Practice edition) by Hiromu Nishiuchi http://www.amazon.co.jp/dp/4478028230) ”has become an exceptional bestseller with a cumulative total of over 370,000 copies in the series. I think there are many people who have read it.
In the two books before and after this, various methods appearing in statistics textbooks are described in "[Generalized Linear Model](http://ja.wikipedia.org/wiki/%E4%B8%80%E8%88%AC%]. E5% 8C% 96% E7% B7% 9A% E5% BD% A2% E3% 83% A2% E3% 83% 87% E3% 83% AB) ”is summarized in one table.
I will quote the table here.
A table summarizing the p170 generalized linear model, where statistics is the strongest study

Practical edition where statistics are the strongest study p344 An expanded version of one table that dramatically advances the understanding of statistics

These two books explain the statistical methods often used in business, what they mean, what kind of ideas they came up with, and how to use them.
In addition, there are three pieces of knowledge that cannot be obtained in this book in the above-mentioned Practical Edition p357.
From this time on, I would like to pay particular attention to 1. above and give examples based on simple data when practicing in the analytical language that I have used so far.
That said, there are some that have already been described so far, so let's proceed as a review of those.
This is the story of Dole and Hill's "case-control study," which is said to be the first epidemiological estimate.
A survey of 1465 lung cancer inpatients from hospitals across the UK between 1948 and 1952 as a link between lung cancer and smoking was as follows:
| Number of people | smoker | 非smoker | |
|---|---|---|---|
| Male lung cancer patient | 1357 | 1350(99.5%) | 7(0.5%) | 
| Male non-lung cancer patient | 1357 | 1296(95.5%) | 61(4.5%) | 
| Female lung cancer patient | 108 | 68(63.0%) | 40(37.0%) | 
| Female lung cancer patient | 108 | 49(45.4%) | 59(54.6%) | 
An epidemiological case is a case, that is, a case (patient) who has become ill, and a control is a comparison.
When a chi-square test is performed on this data, the result is as follows.
import scipy as sp
import scipy.stats as stats
#Male data(Lung cancer patients and non-lung cancer patients)
man = sp.array([[1350, 7], [1296, 61]])
#Women's data(Lung cancer patients and non-lung cancer patients)
female = sp.array([[68, 40], [49, 59]])
def chi_squared_test(data):
    """Function to perform chi-square test"""
    #Chi-square value, p-value, degrees of freedom
    x2, p, dof, expected = stats.chi2_contingency(data)
    return x2, p, dof, expected
results = chi_squared_test(man)
results = chi_squared_test(female)
As a result, for men, the chi-square value is 42.3704259482, the p value is 7.5523446617e-11, and the degree of freedom is 1, which is a significant difference. Similarly, for women, the chi-square value is 6.04195804196 and the p value is 0.0139697819212 with 1 degree of freedom, which is a significant difference.
This shows that smoking cannot be said to have no effect on lung cancer cases.
It will continue to the next.
Recommended Posts