[PYTHON] Measure the relevance strength of a crosstab

Kramer's V coefficient

The story of doing a chi-square test to test independence between discrete variables in a crosstab is previously explained.

The larger the chi-square value χ ^ 2, the stronger the relationship between the two variables. However, the χ ^ 2 value depends on the size of the crosstab and the number of cases, and the maximum value is also different. Another aspect is that it is difficult to compare cross-tabulation tables with different numbers of rows and columns.

In Cramer's V coefficient, χ ^ 2 is converted by the following formula, and the state that is completely unrelated to any crosstabulation table is 0. , Derives a value with the fully related state as 1.

\Phi_c = \sqrt{\frac {\chi^2} {N(k-1)}}

Where N is the total frequency and k is the smaller number of rows or columns in the crosstabulation table.

The effect of the number of cases is corrected by using the total frequency, and the effect of the number of matrices is corrected by taking the smaller of the number of columns and the number of rows. Also, since the original is the chi-square value, it takes the square root.

Calculate with code

Here has a reference code, so I will quote it.

import numpy as np

def det2x2(A, v=False):
    if v:  print('compute 2 x 2 det of')
    if v:  print(A)
    assert A.shape == (2,2)
    return A[0][0]*A[1][1] - A[0][1]*A[1][0]

def det3x3(A):
    print('compute 3 x 3 det of')
    print(A)
    assert A.shape == (3,3)
    a,b,c = A[0]
    c1 = a * det2x2(A[1:3,[1,2]])
    c2 = b * det2x2(A[1:3,[0,2]])
    c3 = c * det2x2(A[1:3,[0,1]])
    return c1 - c2 + c3

def solve(A):
    print('solve')
    print(A, '\n')
    assert A.shape == (3,4)
    D = det3x3(A[:,:3])
    print('D = ', D, '\n')
    if D == 0:
        print('no solution')
        return
    Dx = det3x3(A[:,[3,1,2]])
    print('Dx = ', Dx, '\n')
    Dy = det3x3(A[:,[0,3,2]])
    print('Dy = ', Dy, '\n')
    Dz = det3x3(A[:,[0,1,3]])
    print('Dz = ', Dz, '\n')
    return Dx*1.0/D, Dy*1.0/D, Dz*1.0/D

def check(A,x,y,z):
    print('check')
    for i,r in enumerate(A):
        print('row', i, '=', r)
        pL = list()
        for coeff,var in zip(r[:3],(x,y,z)):
            c = str(round(coeff,2))
            v = str(round(var,2))
            pL.append(c + '*' + v)
        print(' + '.join(pL), end=' ')
        print(' =', r[0]*x + r[1]*y + r[2]*z, '\n')

When executed, it will be like this.

import numpy as np
import cramer

def run_cramer():
    L = [2, 3, 0, 5,
         1, 1, 1, 3,
         2,-1, 3, 7]
    A = np.array(L)
    A.shape = (3,4)
    result = cramer.solve(A)
    if result:
        x,y,z = result
        print('solution')
        print('x =', x)
        print('y =', y)
        print('z =', z, '\n')
        cramer.check(A,x,y,z)

run_cramer()
# =>
# solve
# [[ 2  3  0  5]
#  [ 1  1  1  3]
#  [ 2 -1  3  7]] 
# 
# compute 3 x 3 det of
# [[ 2  3  0]
#  [ 1  1  1]
#  [ 2 -1  3]]
# D =  5 
# 
# compute 3 x 3 det of
# [[ 5  3  0]
#  [ 3  1  1]
#  [ 7 -1  3]]
# Dx =  14 
# 
# compute 3 x 3 det of
# [[2 5 0]
#  [1 3 1]
#  [2 7 3]]
# Dy =  -1 
# 
# compute 3 x 3 det of
# [[ 2  3  5]
#  [ 1  1  3]
#  [ 2 -1  7]]
# Dz =  2 
# 
# solution
# x = 2.8
# y = -0.2
# z = 0.4 
# 
# check
# row 0 = [2 3 0 5]
# 2*2.8 + 3*-0.2 + 0*0.4  = 5.0 
# 
# row 1 = [1 1 1 3]
# 1*2.8 + 1*-0.2 + 1*0.4  = 3.0 
# 
# row 2 = [ 2 -1  3  7]
# 2*2.8 + -1*-0.2 + 3*0.4  = 7.0 
# 

There is also an online calculator at here, as mentioned in the source article.

Recommended Posts

Measure the relevance strength of a crosstab
Measure the importance of features with a random forest tool
Display the signal strength RSSI of a specific SSID (mac)
The story of writing a program
[Python3] Define a decorator to measure the execution time of a function
A quick overview of the Linux kernel
[python] [meta] Is the type of python a type?
A memo explaining the axis specification of axis
Get the filename of a directory (glob)
The story of blackjack A processing (python)
Notice the completion of a time-consuming command
Display the signal strength RSSI of a specific SSID (raspberry pi (linux))
Python code to determine the monthly signal of a relative strength investment
Python: I want to measure the processing time of a function neatly
Let Python measure the average score of a page using the PageSpeed Insights API
Get the caller of a function in Python
Make a copy of the list in Python
Find the number of days in a month
A note about the python version of python virtualenv
The story of making a lie news generator
Calculate the probability of outliers on a boxplot
[Python] A rough understanding of the logging module
Output in the form of a python array
The story of making a mel icon generator
A discussion of the strengths and weaknesses of Python
An easy way to measure the processing speed of a disk recognized by Linux
The story of launching a Minecraft server from Discord
A story that reduces the effort of operation / maintenance
[Python] A program that counts the number of valleys
Get the variable name of the variable as a character string.
A memorandum about the warning of the pylint output result
Calculate volume from the two-dimensional structure of a compound
[GoLang] Set a space at the beginning of the comment
Find the sum of unique values with pandas crosstab
A memo to visually understand the axis of pandas.Panel
Cut a part of the string using a Python slice
Be careful when differentiating the eigenvectors of a matrix
Take a peek at the processing of LightGBM Tuner
Make a BOT that shortens the URL of Discord
Take a screenshot of the LCD with Python-LEGO Mindstorms
Make a note of the list of basic Pandas usage
A python implementation of the Bayesian linear regression class
Python points from the perspective of a C programmer
# Function that returns the character code of a string
The story of making a music generation neural network
Visualize the characteristic vocabulary of a document with D3.js
Steps to calculate the likelihood of a normal distribution
Generate that shape of the bottom of a PET bottle
A story about changing the master name of BlueZ
Tasks at the start of a new python project
Zip 4 Gbyte problem is a story of the past
A story that analyzed the delivery of Nico Nama.
A Study on Visualization of the Scope of Prediction Models
What is a recommend engine? Summary of the types
Output the output result of sklearn.metrics.classification_report as a CSV file
A reminder about the implementation of recommendations in Python
[Python] A program that compares the positions of kangaroos.
Calculate the product of matrices with a character expression?
Create a shape on the trajectory of an object
Python Note: The mystery of assigning a variable to a variable
The meaning of self