1. Statistics learned with Python 1-2. Calculation of various statistics (Numpy)

First, let's take a look at the usefulness of Pandas and Numpy.

-** Numpy handles multidimensional data ** --Numpy targets numeric data in multidimensional arrays, and there are few functions that handle other data types. ――The calculation speed is quite fast, and when it is converted to Cython (converted to C / C ++ and compiled), it becomes as fast as C language. -** Pandas handles real data other than multidimensional ** --For real data other than multidimensional arrays, it is suitable for input / output and processing of data stored in CSV, SQL, and Excel. --It has a function to process all kinds of data, not only numerical data but also time series data or character strings.

Therefore, it can be said that Pandas and Numpy are often combined step by step and used in the following series of flows, for example. 001_002_002.PNG

** ⑴ Prepare data **

data = [12, 3, 5, 2, 6, 7, 9, 6, 4, 11]

I created a standard Python list by writing the values directly inside variable name = [], separated by commas. Below, we will calculate various statistics using Numpy.

** ⑵ Calculate the average value **

np.mean(data)

001_002_001.PNG

Numpy's mean function calculated an average of 6.5.

** ⑶ Calculate the median **

np.median(data)

001_002_003.PNG

Numpy's median function calculated a median of 6.0. When the data is sorted in order of size, the value located exactly in the center is also called the median. If it is equal to the second quartile and there are multiple data, it will be the average of two values close to the center.

** ⑷ Calculate the total value **

np.sum(data)

001_002_004.PNG

Numpy's sum function calculated a total value of 65.

** ⑸ Calculate standard deviation **

np.std(data)

001_002_005.PNG

Numpy's std function gave the result 3.138470965295043. However, this is the value of the ** population standard deviation **. Note that there are two types of standard deviation, the population standard deviation and the ** unbiased standard deviation **.

** Population and Samples </ font> **

The entire subject of research or research is called the ** population **, and the part extracted from the population is called the ** sample **. Surveys that examine the entire population are called ** 100% surveys ** or ** all-out surveys **. Typical examples are the "Census", which requires all people living in Japan to answer, and the "Census of Business Establishments / Companies," which can be called the national census of business establishments. Probably. In other words, almost all data handled in the real world can be said to be samples from sample surveys. However, I don't want to know the characteristics and tendencies of the sample at all, and I always try to estimate the characteristics and tendencies of the population while targeting the samples. Now, we need two statistics to calculate the standard deviation. First, calculate the mean, use it to calculate the variance, and take the square root of the variance to get the standard deviation. The mean, variance, and standard deviation calculated from the sample are prefixed with "sample" or "unbiased", respectively, and ** sample mean $ \ bar {X} $ **, ** unbiased variance $ s ^ 2 They are called $ ** and ** unbiased standard deviation $ s $ **. On the other hand, the mean, variance, and standard deviation of the population estimated from the sample are ** population mean $ μ $ **, ** population variance $ σ ^ 2 $ **, ** population standard deviation $ σ. We call it $ ** to distinguish it.

So, if you want to use Numpy to find the unbiased standard deviation, do the following:

np.std(data, ddof=1) 

001_002_006.PNG

Let's compare it with the standard deviation calculated using Pandas earlier.

df = pd.DataFrame(data) #Convert data to Pandas dataframe
df.describe().loc['std']

001_002_007.PNG

The standard deviation by Pandas is an unbiased standard deviation.


Next, let's calculate the basic statistics using the Python standard library statistics.

Recommended Posts

1. Statistics learned with Python 1-2. Calculation of various statistics (Numpy)
1. Statistics learned with Python 1-3. Calculation of various statistics (statistics)
[Python] Calculation method with numpy
1. Statistics learned with Python 1-1. Basic statistics (Pandas)
1. Statistics learned with Python 2. Probability distribution [Thorough understanding of scipy.stats]
Algorithm learned with Python 8th: Evaluation of algorithm
Algorithm learned with Python 13th: Tower of Hanoi
python numpy array calculation
1. Statistics learned with Python 2-1. Probability distribution [discrete variable]
Various processing of Python
Numerical calculation with Python
"Principle of dependency reversal" learned slowly with Python
Calculation of mutual information (continuous value) with numpy
Derivatives Learned Using Python-(1) Calculation of Forward Exchange Rate-
[Basics of Modern Mathematical Statistics with python] Chapter 1: Probability
Implemented SMO with Python + NumPy
About various encodings of Python 3
Manipulate various databases with Python
Python3 | Getting Started with numpy
[Scientific / technical calculation by Python] Basic operation of arrays, numpy
Play with numerical calculation of magnetohydrodynamics
[Python] Object-oriented programming learned with Pokemon
Perceptron learning experiment learned with Python
Python data structures learned with chemoinformatics
Life game with Python! (Conway's Game of Life)
Efficient net pick-up learned with Python
10 functions of "language with battery" python
Implementation of Dijkstra's algorithm with python
[Python] Calculation of Kappa (k) coefficient
Coexistence of Python2 and 3 with CircleCI (1.0)
Bookkeeping Learned with Python-The Flow of Bookkeeping-
Basic study of OpenCV with Python
[Algorithm x Python] Calculation of basic statistics Part2 (mean, median, mode)
[Algorithm x Python] Calculation of basic statistics (total value, maximum value, minimum value)
[Basics of Modern Mathematical Statistics with python] Chapter 3: Typical Probability Distribution
Basics of binarized image processing with Python
[Examples of improving Python] Learning Python with Codecademy
I replaced the numerical calculation of Python with Rust and compared the speed
Algorithm learned with Python 9th: Linear search
Algorithm learned with Python 7th: Year conversion
Conditional branching of Python learned by chemoinformatics
Check the existence of the file with python
Clogged with python update of GCP console ①
Algorithm learned with Python 4th: Prime numbers
Easy introduction of speech recognition with Python
[Python] Calculation of image similarity (Dice coefficient)
Algorithm learned with Python 2nd: Vending machine
Algorithm learned with Python 19th: Sorting (heapsort)
Source code of sound source separation (machine learning practice series) learned with Python
Numerical calculation of differential equations with TensorFlow 2.0
[Python] Various data processing using Numpy arrays
UnicodeEncodeError struggle with standard output of python3
Algorithm learned with Python 6th: Leap year
[Scientific / technical calculation by Python] Calculation of matrix product by @ operator, python3.5 or later, numpy
Various format specifications of str.format () method of Python3
Drawing with Matrix-Reinventor of Python Image Processing-
Recommendation of Altair! Data visualization with Python
[Python] Various combinations of strings and values
Real-time calculation of mean values with coroutines
Algorithm learned with Python 3rd: Radix conversion
Algorithm learned with Python 12th: Maze search