[PYTHON] Introduction to Statistics The University of Tokyo Press Chapter 2 Exercises

Chapter 2 Exercises

These calculations can be obtained immediately by using existing functions, but we will implement them from scratch in the sense of studying the contents.

2.2 Mean difference / Gini coefficient

A : 0, 3, 3, 5, 5, 5, 5, 7, 7, 10 B : 0, 1, 2, 3, 5, 5, 7, 8, 9, 10 C : 3, 4, 4, 5, 5, 5, 5, 6, 6, 7

Calculate the mean difference and Gini coefficient for the data in.

Here, the mean difference and the Gini coefficient are defined by the following formulas, respectively.

\frac{Σ_iΣ_j|x_i-x_j|}{n^2}
\frac{Σ_iΣ_j|x_i-x_j|}{2n^2 \bar{x}}

Therefore, I wrote the program as follows.

import numpy as np
A = np.array([0,3,3,5,5,5,5,7,7,10])
B = np.array([0,1,2,3,5,5,7,8,9,10])
C = np.array([3,4,4,5,5,5,5,6,6,7])
#Mean difference
def ave_diff(x):
    n=len(x)**2
    result = [np.abs(x[i]-x[j])/n for i in range(len(x)) for j in range(len(x))]
    return sum(result)


"""
print(ave_diff(A))
print(ave_diff(B))
print(ave_diff(C))

2.76
3.7599999999999976
1.2000000000000008
"""
#Gini coefficient
def get_gini(x):
def get_gini(x):
    n=len(x)**2
    x_bar=x.mean()
    result = [np.abs(x[i]-x[j])/(2*n*x_bar) for i in range(len(x)) for j in range(len(x))]
    return sum(result)
"""
print(get_gini(A))
print(get_gini(B))
print(get_gini(C))

0.2760000000000002
0.3760000000000002
0.12000000000000008
"""

2.3 Entropy

When p_i = f_i / n

H(p_1, p_2, ...., p_n) = -Σp_iilog(p_ii)

It is defined by. This amount is called entropy, and the larger H is, the more uniform the distribution is, and the smaller H is, the more concentrated it is.

example I asked 100 students where they came from. The following results were obtained for 10 years ago and this year. Compare the distribution of this place of origin from the standpoint of concentration.

area A B C D E Total
This year 32 19 10 24 15 100
10 years ago 28 13 18 29 12 100
import numpy as np
a=np.array([32, 19, 10, 24, 15])
b=np.array([28,13,18,29,12])

def entropy(x):
    H=0
    n=sum(x)
    
    H=[x[i]/n*np.log10(x[i]/n) for i in range(len(x))]
#     for i in range(len(x)):
#         p=a[i]/n
#         H.append(p*np.log10(p))
    return -sum(H)

"""
print(entropy(a))
print(entropy(b))

0.667724435887455
0.6704368955892825
"""

2.4 Standard score / deviation score

Calculate the standard score and deviation score for data B

Standard score / standardization

z_i = \frac{xi-\bar{x}}{S_x}

So

def standard_score(x):
    x_bar = x.mean()
    s=np.sqrt(x.var())
    z = [(x[i]-x_bar)/s for i in range(len(x))]
    return z

"""
standard_score(B)

[-1.5214515486254614,
 -1.217161238900369,
 -0.9128709291752768,
 -0.6085806194501845,
 0.0,
 0.0,
 0.6085806194501845,
 0.9128709291752768,
 1.217161238900369,
 1.5214515486254614]

"""

About deviation value score

T_i = 10z_i * 50

So I changed the above function a little and

def dev_val(x):
    x_bar = x.mean()
    s=np.sqrt(x.var())
    T = [(x[i]-x_bar)/s*10 +50 for i in range(len(x))]
    return T

'''

def_val(B)


[34.78548451374539,
 37.82838761099631,
 40.87129070824723,
 43.91419380549816,
 50.0,
 50.0,
 56.08580619450184,
 59.12870929175277,
 62.17161238900369,
 65.21451548625461]

'''

It will be.

Recommended Posts

Introduction to Statistics The University of Tokyo Press Chapter 2 Exercises
2016 The University of Tokyo Mathematics Solved with Python
[Introduction to Python3 Day 20] Chapter 9 Unraveling the Web (9.1-9.4)
From the introduction of pyethapp to the execution of contract
Introduction to Quiz Statistics (1) -Mathematical analysis of question sentences to know the tendency of questions-
Introduction to Machine Translation Architecture by the University of Cambridge by Slack Translation App Kiara
Try to solve the problems / problems of "Matrix Programmer" (Chapter 1)
[Chapter 5] Introduction to Python with 100 knocks of language processing
[Chapter 3] Introduction to Python with 100 knocks of language processing
[Chapter 2] Introduction to Python with 100 knocks of language processing
Introduction to Financial Engineering [Investment science] Chapter 2 Exercises Self-answer
[Introduction to Python] Basic usage of the library matplotlib
[Chapter 4] Introduction to Python with 100 knocks of language processing
An introduction to object orientation-let's change the internal state of an object
Try to solve the problems / problems of "Matrix Programmer" (Chapter 0 Functions)
Chapter 1 Introduction to Python Cut out only the good points of deep learning made from scratch
I tried to solve the first question of the University of Tokyo 2019 math entrance exam with python sympy
Summary of Chapter 2 of Introduction to Design Patterns Learned in Java Language
[Python] PCA scratch in the example of "Introduction to multivariate analysis"
Chapter 4 Summary of Introduction to Design Patterns Learned in Java Language
[Introduction to Python] I compared the naming conventions of C # and Python.
[Introduction to StyleGAN] I played with "The Life of a Man" ♬
Use PyCaret to predict the price of pre-owned apartments in Tokyo!
Solving the Maze with Python-Supplement to Chapter 6 of the Algorithm Quick Reference-
From the introduction of JUMAN ++ to morphological analysis of Japanese with Python
[Introduction to SIR model] Consider the fitting result of Diamond Princess ♬
[Introduction to Python3 Day 13] Chapter 7 Strings (7.1-7.1.1.1)
[Introduction to Python3 Day 14] Chapter 7 Strings (7.1.1.1 to 7.1.1.4)
[Introduction to Python3 Day 15] Chapter 7 Strings (7.1.2-7.1.2.2)
Supplement to the explanation of vscode
[Introduction to Python3 Day 21] Chapter 10 System (10.1 to 10.5)
From the introduction of GoogleCloudPlatform Natural Language API to how to use it
Ford-Fulkerson Method and Its Applications-Supplement to Chapter 8 of the Algorithm Quick Reference-
Various methods to numerically create the inverse function of a certain function Introduction
I tried to solve the 2020 version of 100 language processing [Chapter 3: Regular expressions 25-29]
[Introduction to Python] How to sort the contents of a list efficiently with list sort
[Introduction to Data Scientists] Basics of Probability and Statistics ♬ Probability / Random Variables and Probability Distribution
[Introduction to Python] Thorough explanation of the character string type used in Python!
[Introduction to Python] What is the method of repeating with the continue statement?
The story of trying to reconnect the client
[Introduction to Python3, Day 17] Chapter 8 Data Destinations (8.1-8.2.5)
Script to change the description of fasta
10 methods to improve the accuracy of BERT
How to check the version of Django
The story of adding MeCab to ubuntu 16.04
[Introduction to Python3, Day 17] Chapter 8 Data Destinations (8.3-8.3.6.1)
[Introduction to Python3 Day 19] Chapter 8 Data Destinations (8.4-8.5)
[Introduction to Python3 Day 18] Chapter 8 Data Destinations (8.3.6.2 to 8.3.6.3)
Probably the most straightforward introduction to TensorFlow
The story of pep8 changing to pycodestyle
Introduction to Scapy ① (From installation to execution of Scapy)
Introduction to Effectiveness Verification Chapter 1 in Python
[Introduction to Data Scientists] Basics of Python ♬
I tried to solve the 2020 version of 100 language processing knocks [Chapter 3: Regular expressions 20 to 24]
Introduction to Statistical Modeling for Data Analysis Expanding the range of applications of GLM
An introduction to data analysis using Python-To increase the number of video views-
I tried to solve the 2020 version of 100 language processing knocks [Chapter 1: Preparatory movement 00-04]
I tried to solve the 2020 version of 100 language processing knocks [Chapter 1: Preparatory movement 05-09]
[Introduction to Python] How to get the index of data with a for statement
[Introduction to statistics] What kind of distribution is the t distribution, chi-square distribution, and F distribution? A little summary of how to use [python]
Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 1 Memo "Preliminary Knowledge Before Beginning Exercises"