[PYTHON] Summary of test method

t-test

When univariate

Find out if the sample mean is significantly different from a certain value.

When it is bivariate

Find out if there is a significant difference between the two sample means. However, the method is slightly different depending on whether there is a correspondence between the two variables. See the link below for details Functions of stats module

Analysis of variance

Test if there is a difference in the mean between three or more levels. After finding the F ratio, calculate the p-value from it. Assuming that the population distribution is a homoscedastic normal distribution, the sample distribution of the F ratio (F distribution) is already clear, so the p-value can be obtained by using the cumulative distribution of the F distribution.

F ratio

If the F ratio is large, it is judged that the effect is larger than the error.

F ratio=\frac{The magnitude of the effect dispersion}{The magnitude of the error variance}

The magnitude of the effect is the distance between the violin plots. Called intergroup variation. The magnitude of the error is the magnitude of the violin plot. Called intra-gun fluctuation.

Chi-square test

Test the independence of the data for the contingency table.

  1. Find the difference between the observed frequency and the expected frequency. Here, all expected frequencies are required to be 5 or more. The following is the chi-square statistic showing the difference between the expected frequency and the observed frequency.
\chi^2=\sum_{i=1}^{m}\sum_{j=1}^{n}\frac{(O_{ij}-E_{ij})^2}{E_{ij}}

In the above formula, the contingency table with only the data part is m rows and n columns. O is the observed frequency and E is the expected frequency.

  1. Find the p-value.
    Since the sampling distribution of the chi-square statistic has been proved to follow the chi-square distribution with one degree of freedom, the p-value can be obtained by using the cumulative distribution function of the chi-square distribution.

Reference page

https://qiita.com/kanamae879123/items/ec1226fc6d0ba789ae65 https://qiita.com/kanamae879123/items/2502258737a7d8e181c6

Recommended Posts

Summary of test method
Clustering of clustering method
Summary of gamma distribution parameter specification method
Test of uniqueness in paired comparison method
Summary of SQLAlchemy connection method by DB
Numerical summary of data
parallelization of class method
Summary of Tensorflow / Keras
Summary of pyenv usage
Summary of string operations
Summary of Python arguments
[Memo] Test code summary
Summary of logrotate software logrotate
[Python] Summary of table creation method using DataFrame (pandas)
Kaggle Kernel Method Summary [Image]
Summary of python file operations
Summary of Python3 list operations
2017.3.6 ~ 3.12 Summary of what we did
Convenient usage summary of Flask
Summary of Linux distribution types
Unity IAP implementation method summary
Basic usage of Pandas Summary
Behavior of pandas rolling () method
A brief summary of Linux
Summary of Proxy connection settings
[Linux] [C / C ++] backtrace acquisition method summary
Test the version of the argparse module
Summary of how to use pandas.DataFrame.loc
Summary of basic implementation by PyTorch
Summary of scraping relations (selenium, pyautogui)
A brief summary of Python collections
H29.2.27 ~ 3.5 Summary of what I did
Test the goodness of fit of the distribution
test
Summary of how to use pyenv-virtualenv
Einsum implementation of value iterative method
Machine learning ③ Summary of decision tree
Introduction of data-driven controller design method
A rough summary of OS history
A brief summary of qubits (beginners)
Summary of go json conversion behavior
Efficient PCR test by pool method
[Test Driven Development (TDD)] Chapter 21 Summary
A Tour of Go Learning Summary
Anomaly detection introduction and method summary
Summary of "nl command Advent Calendar 2020"
[Anaconda3] Summary of frequently used commands
[python] -1 meaning of numpy's reshape method
Summary of how to use csvkit
[For competition professionals] Summary of doubling
Summary of advantages (disadvantages) when switching from CUnit to Google Test
Summary of Python indexes and slices
[Recommendation] Summary of advantages and disadvantages of content-based and collaborative filtering / implementation method
Summary of multi-process processing of script language
Summary of restrictions by file system
[OpenCV; Python] Summary of findcontours function
Implementation and experiment of convex clustering method
[Python] Summary of how to use pandas
2014/02/28 Summary of contents demoed at #ssmjp, part 1
Summary of Oracle Database XE installation procedure
Summary of frequently used commands of django (beginner)