[PYTHON] [Statistics] An ordinary engineer started studying statistics-part2

Introduction

Mathematics is a little dekill Ordinary backend engineer The story of studying statistics

Last time


What to do this time

Calculate various statistics using Jupyter Notebook.

・ Total value ·Average value ・ Sample dispersion ・ Unbiased dispersion ·standard deviation ·minimum value ·Median ·Maximum value


Library used this time

import numpy as np
import scipy as sp

Sample data

#Try descriptive statistics on the theme of fish data
fish_data = np.array([2,3,3,4,4,4,4,5,5,6])

Calculate the total value

Pass the array fish_data to the sum method of the scipy module. Store the result in sum_value

import numpy as np
import scipy as sp

fish_data = np.array([2,3,3,4,4,4,4,5,5,6])
#Total value(sum_value)Put out
sum_value = sp.sum(fish_data)


Anecdote about total value (column)

Because I can't do physical education on a rainy day, the teacher Numbers from 1 to 100 for boys Where I imposed all additions It took only a few tens of seconds to calculate and produce results.

You all know this boy Gauss, who later became a great figure in the fields of mathematics and physics.


Calculate the number of specimens

It can be calculated by measuring the length of the array.

import numpy as np
import scipy as sp

fish_data = np.array([2,3,3,4,4,4,4,5,5,6])

#Count the number of specimens N
N = len(fish_data)

Calculate the mean

It can be calculated by the total value / the number of samples. It can be calculated with the mean method using scipy.

import numpy as np
import scipy as sp

fish_data = np.array([2,3,3,4,4,4,4,5,5,6])

#Sum to average_value / N
avg = sp.mean(fish_data)

Calculate sample variance

An index that indicates "how far the data is from the average value"

import numpy as np
import scipy as sp

fish_data = np.array([2,3,3,4,4,4,4,5,5,6])

#Sum to average_value / N
avg = sp.mean(fish_data)

#Sample variance is "Sample variance" in English
sigma = sp.sum((fish_data - avg)**2) / N

Calculate the sample variance (2)

You can easily calculate the sample variance using scipy.

import numpy as np
import scipy as sp

fish_data = np.array([2,3,3,4,4,4,4,5,5,6])

#Calculate the sample variance sigma
# (Can be calculated in one shot using scipy's var method)
sigma = sp.var(fish_data , ddof = 0)

Calculate unbiased variance

Variance without bias that underestimates the value of the variance

import numpy as np
import scipy as sp

fish_data = np.array([2,3,3,4,4,4,4,5,5,6])

#Sum to average_value / N
avg = sp.mean(fish_data)

#Count the number of specimens N
N = len(fish_data)

#"Unbiased distribution" in English
unb_dist = sp.sum((fish_data - avg)**2) / (N-1)


Calculate unbiased variance (2)

You can easily calculate the unbiased variance using scipy.

import numpy as np
import scipy as sp

fish_data = np.array([2,3,3,4,4,4,4,5,5,6])

unb_dist = sp.var(fish_data , ddof = 1)

Calculate standard deviation

"How many deviations does Mr. XX have?" It is the deviation value of "Well, the deviation value is low !!?"

It can be calculated by squared the unbiased variance.


import numpy as np
import scipy as sp

fish_data = np.array([2,3,3,4,4,4,4,5,5,6])

#Sum to average_value / N
avg = sp.mean(fish_data)

#Count the number of specimens N
N = len(fish_data)

#"Unbiased distribution" in English
unb_dist = sp.sum((fish_data - avg)**2) / (N-1)

#Standard deviation "standard deviation" in English
std_dev = sp.sqrt(unb_dist)

Calculate the standard deviation (2)

You can easily calculate the standard deviation using scipy.


import numpy as np
import scipy as sp

fish_data = np.array([2,3,3,4,4,4,4,5,5,6])

#Calculate standard deviation using unbiased variance(ddof = 1)
sp.std(fish_data,ddof = 1)



minimum value

You can easily calculate the minimum value using scipy. The smallest number


import numpy as np
import scipy as sp

fish_data = np.array([2,3,3,4,4,4,4,5,5,6])
sp.amin(fish_data)



Median

You can easily calculate the median using scipy. The median is the number that is exactly in the middle of the sample.


import numpy as np
import scipy as sp

fish_data = np.array([2,3,3,4,4,4,4,5,5,6])
sp.median(fish_data)



Maximum value

You can easily calculate the maximum value using scipy. The largest number


import numpy as np
import scipy as sp

fish_data = np.array([2,3,3,4,4,4,4,5,5,6])
sp.amax(fish_data)


Summary

If you can handle numpy and scipy Statistics often used in descriptive statistics can be roughly calculated. However, let's understand how to obtain each statistic.


end

Recommended Posts

[Statistics] An ordinary engineer started studying statistics-part2
[Statistics] An ordinary engineer started studying statistics
Program for studying statistics