[Algorithm x Python] Calculation of basic statistics Part2 (mean, median, mode)

I will write about algorithms and Python. This time, I will write not only how to calculate a simple calculation using a function, but also how to calculate it when the function is not used.

table of contents

  1. Confirmation of arithmetic operators and comparison operators
  2. List comprehension
  3. Find the average value 2-0. Arithmetic mean 2-1. Geometric mean 2-2. Root mean square 2-3. Harmonic mean
  4. Find the median
  5. Find the mode Finally

0. Confirmation of arithmetic operators and comparison operators

Arithmetic operator

◯ About symbols for performing calculations.

operator meaning Example result
+ addition 3 + 4 7
- subtraction 10 - 2 8
* Multiplication 5 * -6 -30
/ Floating point division 5 / 2 2.5
// Integer division 5 // 2 2
% Too much division 7 % 2 1
** ~Ride(index) 2 ** 4 16

Comparison operator

◯ About symbols for comparison. Returns a Boolean value of True or False.

meaning operator
equal ==
Not equal !=
Smaller <
Less than <=
Greater >
Not equal >=
Is an element in ~

List comprehension

◯ List comprehensions are used to efficiently create various types of lists. I would like to use various lists in this article as well, so I will write it in advance.

What is list comprehension?

This is one way to make a list. It is defined in the form of ** [element for element to store in list] **. The ** element ** stored in the list is a ** expression ** that uses the element. A ** iterable object ** is an object that can retrieve elements one by one.

Example of using list comprehension notation

◯ The range () function returns an iterable object.

ex0.py


#Takes integers between 1 and 10 one by one and stores their values in the numbers list
numbers = [number for number in range(1,10)]
print(numbers)
[1, 2, 3, 4, 5, 6, 7, 8, 9]

ex1.py


#Extract integers from 1 to less than 10 one by one and store the value multiplied by 2 in the numbers list.
numbers = [number * 2 for number in range(1,10)]
print(numbers)
[2, 4, 6, 8, 10, 12, 14, 16, 18]

ex2.py


#Extract integers from 1 to less than 10 one by one and store only even values in the numbers list.
numbers = [number for number in range(1,10) if number % 2 == 0]
print(numbers)
[2, 4, 6, 8]

ex3.py


#List of strings
list_1 = ['1','2','3']
#List of strings list_Extract elements from 1 one by one, convert them to integers, and store them in the list numbers
numbers = [int(i) for i in list_1]
print(numbers)
[1, 2, 3]

ex4.py


#If the extracted element is odd, store it as it is. If it is even, square it and store it.
numbers = [i if i % 2 == 1 else i**2 for i in range(10)]
print(numbers)
[0, 1, 4, 3, 16, 5, 36, 7, 64, 9]

2. Find the average value

2-0. How to find the arithmetic mean

◯ It is a generally known average value. It can be calculated by the total value of elements / the number of elements.


How to find the mean value using the mean () function

◯ Import the standard library statistics so that you can use the mean () function.

#Import module
import statistics

numbers = [1,2,3,4,5,6]
#mean in the statistics module()Function
mean_of_numbers = statistics.mean(numbers)
print('mean =',mean_of_numbers)
mean = 3.5

◯ ** Point **: Module A file that summarizes related Python code. It is used in the form of ʻimport module `.

◯ ** Point **: Dot (.) The dot means ** go inside **. To use the functions and variables inside a module, use module.function or module.variable, like statistics.mean ().

Point: Official documentation: statistics


How to find the average value using the total value

◯ ** Calculate by the total value of elements / number of elements **.


#Try defining list elements in list comprehension notation
#The list to define is[1,2,3,4,5,6]
numbers = [number for number in range(1,7)]

#Average = total value/Element count
average = sum(numbers)/len(numbers)

print('numbers = ',numbers)
print('average of numbers = ', average)
numbers =  [1, 2, 3, 4, 5, 6]
average of numbers =  3.5

2-1. How to find the geometric mean

◯ Geometric mean is used when dealing with ** data represented by the rate of change **. (Example) If your savings increase at an annual rate of 5%, it will be multiplied by 1.05. If the value increases at an annual rate of 7% the following year, it will be the amount obtained by multiplying the initial savings amount by 1.05 and then multiplying it by 1.07. The geometric mean is used to ** calculate the annual average of the rate of change ** at this time.

◯ The formula for calculating the geometric mean (geometric_mean) is ** xG = n√ x1 * x2 * x3 *… * xn **. In other words, it can be obtained by multiplying all the data (rate of change) and taking the root of the number of data.

◯ [Meaning and calculation method of geometric mean (geometric mean) -Calculating the average growth rate-](https://toukeigaku-jouhou.info/2015/08/23/geometric-mean/#:~:text=% E5% B9% BE% E4% BD% 95% E5% B9% B3% E5% 9D% 87% EF% BC% 88geometric% 20mean% EF% BC% 89% E3% 81% AF,% E3% 81% A7 % E5% BE% 97% E3% 82% 89% E3% 82% 8C% E3% 82% 8B% E5% 80% A4% E3% 81% A7% E3% 81% 99% E3% 80% 82 & text =% EF% BD% 81% E3% 81% A8% EF% BD% 82% E3% 81% A8% EF% BD% 83,% E3% 82% 92% E3% 81% 97% E3% 81% A6% E3 % 81% 84% E3% 81% 8D% E3% 81% BE% E3% 81% 99% E3% 80% 82)


How to find the geometric mean using the gmean () function

◯ Calculate an example of company sales.

1st year: 10 million yen Second year: 25 million yen (2.5 times the previous year) Third year: 40 million yen (1.6 times the previous year)

Assuming that the annual average of the rate of change in sales is geometric_mean

#gmean()Import function
from scipy.stats import gmean

# 2√ 2.5*1.6 calculate
#Geometric of the annual average of the rate of change in sales_mean
geometric_mean = gmean([2.5,1.6])
print(geometric_mean)
2.0

Point:Scipy A library for performing advanced scientific calculations. Reference article Basic usage of library scipy that you absolutely should know

◯ ** Point **: gmean () function Reference article Official documentation


How to find the geometric mean using the geometric_mean () function

◯ It seems that the ### geometric_mean () function has been introduced in the statistics module from Python 3.8. So Python 3.8 and above can use it.

from statistics import geometric_mean
print(geometric_mean([2.5,1.6]))
2.0

Official documentation: statistics


How to find the geometric mean using the root () function

import sympy

#Sales(sales)List of
sales = [1000,2500,4000] 

#Rate of change(rate_of_change)List of
#This list is[2.5, 1.6]become
rate_of_changes = [sales[i+1]/sales[i] if i < len(sales)-1 for i in range(len(sales))]

#Substitute the product of all elements into the variable mul
#First, assign the first element of the rate of change list to the variable mul
#At this point mul= 2.Become 5
mul = rate_of_changes[0]

#Multiply the elements
for i in range(len(rate_of_changes)):
    #How long to repeat
    if i < len(rate_of_changes)-1:
       #In the variable mul, i+Substitute by multiplying by 1 element
       mul *= rate_of_changes[i+1]
    else:
        break
#root(The contents of the route,~Multiplication)
#The content of this route is the product of all the elements
geometric_mean = sympy.root(mul,len(rate_of_changes))
print('geometric_mean = ',geometric_mean)
geometric_mean = 2.00000000000000

How to use Python, SymPy (factorization, equations, calculus, etc.)

2-2. How to find the root mean square

◯ The root mean square is calculated by squared the numbers you want to average, add them up, divide by the number of elements n, and then square root the value **. It is used when you want to calculate the difference from the arrival time with respect to the transportation timetable.

◯ There is no difference in time lag between arriving 2 minutes late and arriving 2 minutes early. However, it offsets the ** error ** with the ** arithmetic mean ** with plus or minus. So, ** square to eliminate the minus and ** calculate.


How to find the root mean square using the root () function

#root()Import sympy to use the function
import sympy
#Try rounding the final value using the standard library decimal module
from decimal import Decimal, ROUND_HALF_UP

#List of errors
data_list = [-2,3,4,-5]
#Squared each element of the list of errors to a new list squared_Create list
squared_list = [i**2 for i in data_list]
#squared_First find the average of list(Total value/Element count)
mean_square = sum(squared_list)/len(squared_list)
#mean_Take the square root of square
root_mean_square = sympy.root(mean_square,2)

print('RMS = ',root_mean_square)
#Str to treat exactly as a Decimal type of that value()Convert type with
#Decimal('Number of digits you want to find')To specify the digit
#ROUND_HALF_Perform general rounding with UP
print('Rounded RMS = ',Decimal(str(root_mean_square)).quantize(Decimal('0.1'), rounding=ROUND_HALF_UP))
RMS = 3.67423461417477
Rounded RMS =  3.7

Point:Decimal.quantize Reference article Rounding decimals and integers in Python round and Decimal.quantize

2-3. How to find the harmonic mean

◯ Harmonic mean is used to find the average value of speed per hour.

◯ As an example, find the average speed when a car travels 200km on the outbound route at 80km / h and 200km on the return route at 30km / h. The speed is distance / hour, and the average speed is ** total distance / total time **. ** Total distance ** is 200 + 200 = 400 (km) ** Total time ** (distance / speed) is 200/80 + 200/30 = 2.5 (hours) + 6.666 (hours) = 9.166 (hours) ** Average speed ** = Total distance / Total time = 400 / 9.166 = 43.636 (km / hour)


How to find the harmonic mean using the harmonic_mean () function

import statistics

#harmonic_mean([x1,x2,...xn])
print(statistics.harmonic_mean([80,30]))
43.63636363636363

How to find the harmonic mean using distance

distance = 200
#Find the total distance using the distance
total_distance = distance* 2

#Prepare a list containing speed values
speed_list = [80,30]
#Time required for each from the speed of the list(distance/speed)To get and list
time_list = [distance/speed for speed in speed_list]
#Total value of each required time(Total time)Seeking
total_time = sum(time_list)

#Harmonic mean=Total distance/Total time
harmonic_mean = total_distance/total_time
print('harmonic_mean = ',harmonic_mean)
harmonic_mean =  43.63636363636363

◯ ** Point **: Reference article Meaning and calculation method of harmonic mean

3. Find the median

◯ The median value is the middle value when the data is arranged in ascending or descending order. When the number of data is even, the number in the middle is two, so add them and divide by two to get the median. The advantage of the median is that it is less sensitive to outliers (extremely distant values).


How to find the median using the median () function

◯ It is not necessary to distinguish between odd-numbered and even-numbered elements.

#median()Import a module to use a function
import statistics

#The number of elements in the list is odd
numbers = [1,100,4,7,3]
#Variable median_of_Substitute the median value for number
#Access using dots to use the functions in the module
median_of_numbers = statistics.median(numbers)
print('median of numbers = ',median_of_numbers)
median of numbers = 4

Divide the number of elements into two to find the median

◯ We will consider cases according to whether the number of elements is odd or even. If it is odd, find the index of the number in the middle of the elements by ** number of elements // 2 ** (division of integers). If it is an even number, the index of the number in the middle of the elements is calculated by ** number of elements / 2 ** and ** number of elements / 2-1 **.

#The number of elements in the list is even
numbers = [1,100,4,7,3,8]
#First sort the list in ascending order
numbers.sort()
#Variable length_of_Substitute the number of elements in the numbers list for numbers
length_of_numbers = len(numbers)

#If the number of elements is odd,
if(length_of_numbers % 2 == 1):
    #Variable median_Assign index to the index of the middle value of the list element
    #For example, the index of the middle value in a list of 5 elements is 5//2 = 2
    median_index = length_of_numbers//2
    print('median of numbers = ',numbers[median_index])
#If the number of elements is even,
else:
    #Variable median with a value that is half the number of elements in the list_Assign to index
    median_index = length_of_numbers//2
    #The median value when the number of elements is even is the sum of the two middle values and divided by two.
    #For example, when the number of elements is 6, the index with the middle value is 6/2-1 =2 and 6/2 = 3
    print('median of numbers = ',(numbers[median_index-1] + numbers[median_index])/2)
 
#(4 + 7)/2 = 5.5
median of numbers = 5.5

4. Find the mode

◯ Find the most frequently appearing element in the list.


How to find the mode using the Counter class

#Import the Counter class from the collections module
from collections import Counter
medals = ['gold','silver','gold','silver','silver','bronze']
#Instance generation
medal_counter = Counter(medals)

print(medal_counter)
Counter({'silver': 3, 'gold': 2, 'bronze': 1})

◯ ** Point **: How to use the class A class is a collection of related fields and methods. However, the class itself is abstract and cannot be used directly in your program. So, first, create an object by embodying the class. This time, Counter (medals) is that. This is called an instance. When dealing with this in the program, treat it as a variable. To do this, assign an instance to the variable medal_counter like medal_counter = Counter (medals).

Point:most_common() A method of the Counter class. Returns all elements in descending order. If an integer is specified as an argument, only that number will be displayed counting from the top. (Example)

print(medal_counter.most_common(1))
[('silver', 3)]

How to find the mode when there are multiple modes

◯ Create a program in case there are multiple modes. You define your own ** function that returns a list of modes **.

#Import the Counter class
from collections import Counter

#A self-made function that returns a list of modes(mode_func)To define
def mode_func(letters):
    #First letter(letter)And get its appearance count
    #letter_counter = Counter({'t': 2, 'o': 2, 'e': 1, 'x': 1, '_': 1, 'b': 1, 'k': 1})
    letter_counter = Counter(letters)
    #Next, get a list of characters and their set of occurrences in descending order, and the variable letter_and_Assign to count
    #[('t', 2), ('o', 2), ('e', 1), ('x', 1), ('_', 1), ('b', 1), ('k', 1)]
    letter_and_count = letter_counter.most_common()
    #I arranged them in descending order, so"Number of appearances of the leftmost element"(Here 2)Is confirmed to be one of the modes
    #Make it a variable max_Assign to count
    max_count = letter_and_count[0][1]
    #Create a list to store the mode, and add other elements with the same number of occurrences as needed.
    mode_list = []
    for letter in letter_and_count:
        #If the element(letter)If the number of occurrences of is the same as that of the mode
        if letter[1] == max_count:
            #Add that character to the list of modes
            mode_list.append(letter[0])
    #Finally returns a list of modes
    return(mode_list)

#Write it to make it easier to reuse the function
if __name__ == '__main__':
    #Assign a list of strings to the variable letters
    letters = list('text_book')
    #mode_func()The list returned by the variable mode_Assign to list
    mode_list = mode_func(letters)
    #mode_There may be multiple modes in list, so take them all out and write them out
    for mode in mode_list:
        print('Mode = ',mode)

Mode = t
Mode = o

Point:if __name__ == '__main__': Reference article Answer to what is Python's ʻif name =='main'`

Finally

Thank you for reading. Next time, I will write about the calculation of basic statistics Part3 (variance, standard mean ...). I would be grateful if you could point out any mistakes or improvements.

Recommended Posts

[Algorithm x Python] Calculation of basic statistics Part2 (mean, median, mode)
[Algorithm x Python] Calculation of basic statistics Part3 (range, variance, standard deviation, coefficient of variation)
[Algorithm x Python] Calculation of basic statistics (total value, maximum value, minimum value)
Basics of Python x GIS (Part 3)
[Statistics for programmers] Mean, median, mode
Basics of Python x GIS (Part 2)
1. Statistics learned with Python 1-3. Calculation of various statistics (statistics)
1. Statistics learned with Python 1-2. Calculation of various statistics (Numpy)
Sequential calculation of mean value with online algorithm
Python basic grammar / algorithm
Python basic memorandum part 2
Python basic memo --Part 2
Calculate mean, median, mode, variance, standard deviation in Python
Basic knowledge of Python
Python basic memo --Part 1
[Scientific / technical calculation by Python] Basic operation of arrays, numpy
Python Basic Grammar Memo (Part 1)
2.x, 3.x character code of python
Basic usage of Python f-string
Basics of Python × GIS (Part 1)
Summary of basic knowledge of PyPy Part 1
1. Statistics learned with Python 1-1. Basic statistics (Pandas)
Basic grammar of Python3 system (dictionary)
Implementation of Dijkstra's algorithm with python
[Python] Calculation of Kappa (k) coefficient
Python application: data visualization part 1: basic
Basic study of OpenCV with Python