I made a familiar function that can be used in statistics with Python

table of contents

――What made you decide to write this article --Execution environment --Average --Range --Median 1 --Median 2 (only in ascending and descending order)

What made me decide to write this article

When I was reading a statistics book at the library, I thought, "I think I can use Python to find (standard deviation, etc.)." I also wanted to improve my Python skills, so I wrote this article for a change of pace.

Execution environment

macOS Big Sur 11.1 I'm running on paiza.IO (Python3).

average

program

def mean(l):
    return sum(l)/len(l)

Example of use

print(mean([3,4,5,7,6]))
#5.0
print(mean([3,4,11,7,6]))
#6.2

important point

--The return value is a float type --Points that can be used for lists that are not in ascending order ――When the number becomes extremely large, it is omitted with "e" as shown below.

print(mean([3,4,1111111111111111111111111111111,7,6]))
#2.222222222222222e+29

range

program

def Range(l):
    return max(l)-min(l)

It does not have to be in ascending order.

important point

--The first letter of the function name is capitalized so that it does not overlap with the built-in variable range.

Median 1

program

def chuuouti(l):
    l2 = sorted(l)
    if len(l2)%2:
        return l2[int(len(l)/2)]
    else:
        return (l2[int(len(l)/2)-1]+l2[int(len(l)/2)])/2

Example of use

print(chuuouti([1,4,2,3]))

important point

--When the number of elements is even, the average of the two elements in the middle is taken. --It does not have to be in ascending order (however, it takes some time to process sorted, so it is better to use "median 2" in ascending order.

example

4th Algorithm Practical Test A-Median

Median 2 (only in ascending and descending order)

program

def chuuouti(l):
    if len(l)%2:
        return l[int(len(l)/2)]
    else:
        return (l[int(len(l)/2)-1]+l[int(len(l)/2)])/2

important point

--Can only be used in ascending or descending order --When the length is even, the return value is a float type, but when the length is odd, the return type matches the median. (under)

#When the length is even
print(chuuouti([3,2,2,1]))
#2.0
print(chuuouti([4,3,2,1]))
#2.5

#When the length is odd
print(chuuouti([1,3,7]))
#3
print(chuuouti([1,3.2,7]))
#3.2

standard deviation

program

from math import sqrt
def deviation_value(l):
    mean = sum(l)/len(l)#average
    deviation_square = [(l[i]-mean)**2 for i in range(len(l))]#Deviation square
    deviation_square_sum = sum(deviation_square)#Deviation sum of squares
    return sqrt(deviation_square_sum/len(l))#Deviation value

Example of use

print(deviation_value([10,15,5,-22,-18]))
#15.086417732516887

In the above usage example, the average is {10 + 15 + 5 + (-22) + (-18)}/5 to -2. Therefore, the deviation square is 144,289,49,400,256. This sum is 1138. Divide this by the number of elements (5) and take the square root to get the standard deviation.

Cumulative relative frequency

program

def cumulative_relative_frequency(x):
    wa = sum(x)#The sum of the whole
    l = [0]
    y = [x[i]/wa for i in range(len(x))]
    #Cumulative sum from here
    for i in range(len(y)):
        l.append(l[-1]+y[i])
    del l[0]
    return l

Example of use

print(cumulative_relative_frequency([1,3,7,6,9]))
#[0.038461538461538464, 0.15384615384615385, 0.4230769230769231, 0.6538461538461539, 1.0]

Coefficient of variation

program

from math import sqrt
def coefficient_of_variation(l):
    mean = sum(l)/len(l)#average
    deviation_square = [(l[i]-mean)**2 for i in range(len(l))]#Deviation square
    deviation_square_sum = sum(deviation_square)#Deviation sum of squares
    return sqrt(deviation_square_sum/len(l))/mean

Example of use

print(coefficient_of_variation([4,5,3,5]))
#https://bellcurve.jp/statistics/course/19515.html example.

Actually it is 0.2, but since it is output as 0.195, it seems that there is a slight error. This error appears to have occurred at the square root (sqrt).

Bivariate analysis

Covariance

program

def bivariate_analysis(x,y):
    mean_x = sum(x)/len(x)
    mean_y = sum(y)/len(y)
    deviation_x = [x[i]-mean_x for i in range(len(x))]
    deviation_y = [y[i]-mean_y for i in range(len(y))]
    product_of_deviations = [deviation_x[i]*deviation_y[i] for i in range(len(x))]
    return sum(product_of_deviations)/len(x)

Example of use

print(bivariate_analysis([4,7],[3,6]))

By using this function, the covariance of the two lists can be obtained.

merit

--The exact value is required.

Correlation coefficient

program

def correlation_coefficient(x,y):
    return bivariate_analysis(x,y)/deviation_value(x)/deviation_value(y)

Program with standard deviation and covariance

from math import sqrt

def deviation_value(l):#standard deviation
    mean = sum(l)/len(l)#average
    deviation_square = [(l[i]-mean)**2 for i in range(len(l))]#Deviation square
    deviation_square_sum = sum(deviation_square)#Deviation sum of squares
    return sqrt(deviation_square_sum/len(l))#standard deviation
    
    
def bivariate_analysis(x,y):#Covariance
    mean_x = sum(x)/len(x)
    mean_y = sum(y)/len(y)
    deviation_x = [x[i]-mean_x for i in range(len(x))]
    deviation_y = [y[i]-mean_y for i in range(len(y))]
    product_of_deviations = [deviation_x[i]*deviation_y[i] for i in range(len(x))]
    return sum(product_of_deviations)/len(x)
    
def correlation_coefficient(x,y):
    return bivariate_analysis(x,y)/deviation_value(x)/deviation_value(y)

important point

--I am using a program with standard deviation and covariance.

Bonus edition

There is a class of n people. Find the probability that there will be students with the same birthday.

I would like to implement it using Probability of having a duo with the same birthday. From Probability of having a duo with the same birthday, the probability to be calculated is

1−_{365}P_n\div365^n

become.

program
from math import factorial
def permutations_count(n, r):#Excerpt from the * link below
    return factorial(n) // factorial(n - r)
def birthday(n):#Probability of having people with the same birthday in a class of n people
    return 1-permutations_count(365,n)/365**n

Let's find out how many times n is more than 75% for the first time using this program. Also, let's find the probability at that time.

program

people = 0
while True:
    if birthday(people) >= 0.75:
        print(people)
        print(birthday(people))
        break
    people += 1

people is the number of people in the class. If you know that birthday (100) is over 75%, a for statement may be fine. It seems that 23 people are already 50.7%.

References

--Translation -goo dictionary -Google Translate --How to write -Implemented in python after understanding the phenomenon that various probability distributions occur --Program - [ The Programing Guide] (https://strawberryprogrami.wixsite.com/programing) --Meaning of terms -Ready-to-use statistics by Tamio Kan and Yuko Hijikata, 1st edition, 1st edition Publisher: SoftBank Creative -Meaning of covariance and simple method -5-3. Let's find the coefficient of variation -Mathematical symbols (equal sign, inequality sign, operator, set) --Illustrated trivia statistics by Norio Konno, Natsumesha

Recommended Posts

I made a familiar function that can be used in statistics with Python
I made a shuffle that can be reset (reverted) with Python
I made a package that can compare morphological analyzers with Python
I created a template for a Python project that can be used universally
I made a module PyNanaco that can charge nanaco credit with python
Understand the probabilities and statistics that can be used for progress management with a python program
Japanese can be used with Python in Docker environment
Python knowledge notes that can be used with AtCoder
I want to create a priority queue that can be updated in Python (2.7)
I registered PyQCheck, a library that can perform QuickCheck with Python, in PyPI.
I made a fortune with Python.
I made a daemon with Python
I made a simple typing game with tkinter in Python
I made a puzzle game (like) with Tkinter in Python
Scripts that can be used when using bottle in Python
I made a plug-in that can "Daruma-san fell" with Minecraft
[Python] I made a function that decrypts AES encryption just by throwing it with pycrypto.
[Python] Make a graph that can be moved around with Plotly
[Python] I made my own library that can be imported dynamically
I made a payroll program in Python!
I made a character counter with Python
Can be used with AtCoder! A collection of techniques for drawing short code in Python!
Make a Spinbox that can be displayed in Binary with Tkinter
A timer (ticker) that can be used in the field (can be used anywhere)
I made a program to collect images in tweets that I liked on twitter with Python
[Python] I made an image viewer with a simple sorting function.
[Atcoder] [C ++] I made a test automation tool that can be used during the contest
I made a Hex map with Python
Python standard input summary that can be used in competition pro
Make a Spinbox that can be displayed in HEX with Tkinter
I made a roguelike game with Python
I made a simple blackjack with Python
[python] I made a class that can write a file tree quickly
I made a configuration file with Python
I made a neuron simulator with Python
A story that I wanted to do a function like before_action used in rails with django [Beginner learns python with a reference book]
I made it because I want JSON data that can be used freely in demos and prototypes
I bought and analyzed the year-end jumbo lottery with Python that can be executed in Colaboratory
I made a web application in Python that converts Markdown to HTML
Mathematical optimization that can be used for free work with Python + PuLP
I made a Discord bot in Python that translates when it reacts
[Python] I made a utility that can access dict type like a path
I made a simple timer that can be started from the terminal
I made a tool that makes decompression a little easier with CLI (Python3)
[IOS] I made a widget that displays Qiita trends in Pythonista3. [Python]
I made a weather forecast bot-like with Python.
File types that can be used with Go
A memo that I wrote a quicksort in Python
Functions that can be used in for statements
I made a GUI application with Python + PyQt5
I made a Twitter fujoshi blocker with Python ①
[Python] I made a Youtube Downloader with Tkinter.
I made a Caesar cryptographic program in Python.
I made a bin picking game with Python
I made a Mattermost bot with Python (+ Flask)
[Python] I made a function that can also use regular expressions that replace character strings all at once.
I tried to create a class that can easily serialize Json in Python
A personal memo of Pandas related operations that can be used in practice
I made a Docker image that can call FBX SDK Python from Node.js
How to install a Python library that can be used by pharmaceutical companies
In Python, I made a LINE Bot that sends pollen information from location information.