1. Statistics learned with Python 1-1. Basic statistics (Pandas)

Various terms are used in statistics. There are many things that are very familiar, such as averages and deviations, to things that you are not familiar with. First of all, I would like to start by understanding the meaning of basic terms correctly. (As a general rule, write the code and check the result on Google Colaboratory)

** ⑴ Import the library used for numerical calculation **

import numpy as np
import pandas as pd

** ⑵ Read the dataset **

df = pd.read_csv("https://raw.githubusercontent.com/karaage0703/machine-learning-study/master/data/karaage_data.csv")

The csv file specified in ("filename") is read using the read_csv function of pandas and stored in the variable df.

** ⑶ Check the contents of the data **

df.head()

Only the first 5 lines of data stored in the variable df by the head function are displayed. 001_001_001.PNG

You can see that the data consists of two variables, x and y.

** ⑷ Calculate basic statistics **

df.describe()

The pandas describe function gets a list of basic statistics. 001_001_002.PNG

Now, with the term ** statistics **, we call the aggregated value of the data that way. By looking at the statistics, you can see the characteristics of the sample. Let's check the eight statistics shown as basic statistics and the meaning of each term.

Statistics Fluent x Fluent y Meaning of terms
count Number of specimens 6 6 n=Contains 6 or 6 lines of data in total
mean Average value 14.33 3.33 Used as a so-called representative value (value representing a sample)
std standard deviation 16.01 1.51 Abbreviation for standard deviation, one of the statistics that shows how much the data varies.
min minimum value 1.00 2.00 The smallest value in the variate
25% 1st quartile 2.75 2.25 When the data is sorted in ascending order, the number of data is counted from the smallest to the first quarter.
50% Second quartile 7.50 3.00 When the data is sorted in ascending order, the value corresponding to the second quarter of the number of data counted from the smallest
75% Third quartile 23.50 3.75 When the data is sorted in ascending order, the number of data is the third quarter from the smallest.
max Maximum value 40.00 6.00 The largest value in that variate

** ⑸ Calculate basic statistics individually **

First, let's calculate the average.

df.describe().loc['mean']

001_001_003.PNG

Next, calculate the standard deviation and the first quartile by specifying the statistic in loc ['xxx'].

df.describe().loc['std']

001_001_004.PNG

df.describe().loc['25%']

001_001_005.PNG


So far, we've used Pandas to look at basic statistics. Next, let's try to calculate various statistics using Numpy, and consider the basic calculation method and characteristics of the statistics.

Recommended Posts

1. Statistics learned with Python 1-1. Basic statistics (Pandas)
Statistics with python
1. Statistics learned with Python 1-3. Calculation of various statistics (statistics)
Python Basic --Pandas, Numpy-
1. Statistics learned with Python 1-2. Calculation of various statistics (Numpy)
(Note) Basic statistics on Python & Pandas on IBM DSX
1. Statistics learned with Python 2-1. Probability distribution [discrete variable]
Read csv with python pandas
Python application: Pandas Part 1: Basic
BASIC authentication with Python bottle
[Python] Using OpenCV with Python (Basic)
[Python] Change dtype with pandas
[Python] [SQLite3] Operate SQLite with Python (Basic)
I learned Python basic grammar
Scraping with Selenium in Python (Basic)
[Python] Object-oriented programming learned with Pokemon
[Python] Join two tables with pandas
Perceptron learning experiment learned with Python
Python data structures learned with chemoinformatics
Efficient net pick-up learned with Python
1. Statistics learned with Python 2. Probability distribution [Thorough understanding of scipy.stats]
Excel aggregation with Python pandas Part 1
[Python] Format when to_csv with pandas
Basic study of OpenCV with Python
[Python] Reactive Extensions learned with RxPY (3.0.1) [Rx]
Basic Linear Algebra Learned in Python (Part 1)
Algorithm learned with Python 10th: Binary search
Getting Started with python3 # 1 Learn Basic Knowledge
Algorithm learned with Python 5th: Fibonacci sequence
Algorithm learned with Python 9th: Linear search
Algorithm learned with Python 7th: Year conversion
Algorithm learned with Python 8th: Evaluation of algorithm
Algorithm learned with Python 4th: Prime numbers
Algorithm learned with Python 2nd: Vending machine
Algorithm learned with Python 19th: Sorting (heapsort)
Excel aggregation with Python pandas Part 2 Variadic
Learn Python! Comparison with Java (basic function)
Algorithm learned with Python 6th: Leap year
[Easy Python] Reading Excel files with pandas
Algorithm learned with Python 3rd: Radix conversion
Algorithm learned with Python 12th: Maze search
Algorithm learned with Python 11th: Tree structure
FizzBuzz with Python3
Scraping with Python
RF Python Basic_01
My pandas (python)
Scraping with Python
Python with Go
Twilio with Python
Play with 2016-Python
Tested with Python
Basic Python writing
with syntax (Python)
Python3 basic grammar
Bingo with python
Zundokokiyoshi with python
RF Python Basic_02
python pandas notes
Excel with Python
Microcomputer with Python
Cast with python