[PYTHON] Numerical summary of data

I will write about numerical summarization, which is the basic summarization method for data analysis.

Summary of one-dimensional data

import  numpy as np

x=np.array([1,2,3,4.5,5,6.5,7,8,9,10])

average=np.mean(x)  ///Mean value mean function///
(Out  5.6)

med=np.median(x)   ///Median function///
(Out  5.75)

var.p=np.var(x)  ///Sample variance var function///
(Out  8.19)

std=np.std(x)   ///Standard deviation std function///
(Out  2.86)

Please refer to here for the meaning of each word. https://note.com/karaage_love/n/n6f617d38c528

Summary of 2D data

import numpy as np
import matplotlib.pyplot as plt

array=np.loadtxt(fname='example.csv',delimiter=',',encoding="utf-8_sig")
///example.csv contains two columns of data.///

array_x=array[:,0]
array_y=array[:,1]  ///slice///

plt.scatter(araay_x,array_y,s=10,c='blue',alpha='0.5')  

///Creating a scatter plot s is the size c is the color of the scatter plot alpha is the transparency///

np.cov(array_x,array_y,bias=True) 
(Out   [[6.72727273 3.54545455]
        [3.54545455 6.        ]])
 //The covariance result is a 2 × 2 matrix. The diagonal components are the variances of x and y, respectively. The rest is covariance.///
np.corrcoef(array_x,array_y)  
(Out   [[1.         0.55805471]
        [0.55805471 1.        ]]
///Correlation coefficient: After all, the correlation coefficient is other than the diagonal component.///

See here for a detailed summary of 2D data. https://note.com/karaage_love/n/n992a7fdf9b1f

Recommended Posts

Numerical summary of data

Summary of Tensorflow / Keras

Summary of pyenv usage

Summary of string operations

Summary of Python arguments

Preprocessing of prefecture data

Selection of measurement data

Summary of logrotate software logrotate

Summary of test method

Summary of how to read numerical data with python [CSV, NetCDF, Fortran binary]

Summary of tools needed to analyze data in Python

Summary of Pandas methods used when extracting data [Python]

Summary of python file operations

Summary of Python3 list operations

python-fitbit data acquisition query summary

2017.3.6 ~ 3.12 Summary of what we did

Tuning experiment of Tensorflow data

Visualization of data by prefecture

Convenient usage summary of Flask

Summary of Linux distribution types

Fourier transform of raw data

Average estimation of capped data

Python data type summary memo

Basic usage of Pandas Summary

A brief summary of Linux

About data management of anvil-app-server

Summary of Proxy connection settings

Probability prediction of imbalanced data

Basic summary of data manipulation in Python Pandas-Second half: Data aggregation

Let's utilize the railway data of national land numerical information

Performance verification of data preprocessing for machine learning (numerical data) (Part 2)

Performance verification of data preprocessing for machine learning (numerical data) (Part 1)

Basic summary of data manipulation with Python Pandas-First half: Data creation & manipulation

Play with numerical calculation of magnetohydrodynamics

Summary of how to use pandas.DataFrame.loc

Summary of basic knowledge of PyPy Part 1

Summary of basic implementation by PyTorch

Summary of scraping relations (selenium, pyautogui)

A brief summary of Python collections

H29.2.27 ~ 3.5 Summary of what I did

Summary of Stack Overflow Developer Survey 2020

Summary of how to use pyenv-virtualenv

Basic map information using Python Geotiff conversion of numerical elevation data

Machine learning ③ Summary of decision tree

Summary of various operations in Tensorflow

A rough summary of OS history

Memory-saving matrix conversion of log data

A brief summary of qubits (beginners)

Summary of go json conversion behavior

Check the data summary in CASTable

A Tour of Go Learning Summary

Differentiation of time series data (discrete)

10 selections of data extraction by pandas.DataFrame.query

Animation of geographic data by geopandas

Summary of "nl command Advent Calendar 2020"

Recommendation of data analysis using MessagePack

Time series analysis 3 Preprocessing of time series data

[Anaconda3] Summary of frequently used commands

Summary of how to use csvkit

[For competition professionals] Summary of doubling

Summary of Python indexes and slices