Data analysis python

As an output of study

Contents

・ Overview of basic libraries used in data analysis ・ Elementary code

Library

There are the following three libraries used in data analysis. Parentheses are customary terms ・ Pandas (pd) ・ Numpy (np) ・ Pyplot (plt) of matplotlib

pandas pandas is a library that can read data, check simple information of data, arrange data, check and delete missing areas, and aggregate.

numpy python A library that makes it easy to build numerical calculation algorithms that process faster than conventional numerical calculations.

matplotlib Drawing library that supports graphs such as 2D graphs and 3D graphs

Elementary code

import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
%matplotlib inline #Display in current browser

df = pd.read_csv("file name") #Read csv in file
df = pd.read_csv("file name",header=None) #You can set whether to add a heading or not by specifying the header.
df.head() #Read the first five lines of the csv file
df.tail() #Read the last five lines of the csv file
#If you specify a value for the function argument, you can read up to the specified line.
df.head(10) #Read from the beginning to the 10th line of the csv file
df.tail(10) #Read from the end to the 10th line of the csv file
df.shape #A property that calculates the number of matrices in a file
df.describe() #A function that calculates basic statistics such as minimum and maximum values, standard deviation, and mean
df.info() #A function that looks up the types of strings, integers, and floating point numbers
df["Column name"] #Specific column(column)Extract
df[["Column name","Column name",...,"Column name"]] #Specific column(column)Extract multiple
df[df["Column name"]Conditional expression] #Extract columns that meet the conditions
df[df["y"]>=df["y"].mean()] #"y"Extract above the average of y from the column
df["Column name"].sort_values(by="y",accending=False) #Sort in descending order for y
df["Column name"][df["Column name"]Conditional expression] #Extract the left column that meets the conditions of the right parenthesis
df["Column name"].plot() #横軸を行番号、縦軸を指定したColumn nameの数値の折れ線グラフを生成
df["Column name"].plot(figsize=(side,Vertical)) #Set the graph size ratio with figsize
df["Column name"].plot(figsize=(side,Vertical),title="Title name") #Title setting
ax = df["Column name"].plot(figsize=(side,Vertical),title="Title name")
ax.set_xlabel("Label name") #x軸のLabel nameを設定
ax.set_ylabel("Label name") #y軸のLabel nameを設定
df["Column"].plot.hist() #ヒストグラムを生成、Columnを階級で分けて度数を調べてくれる
df["Column"].plot.hist(grid=True) #Add grid lines
plt.axvline(x=Numerical value,color="color") #Draw a vertical line

plt.axvline(x=df["y"],color="red") 
df["y"].plot.hist()                #Overlay graphs

plt.axvline(x=df["y"],color="red") 
df["y"].plot.hist()                
plt.savefig("file name.extension") #Save graph

df[["Column name 1","Column name 2"]].boxplot(by="Column name 1") #boxplotで指定した引数の項目ごとのColumn name 2の数のばらつきを調べる箱ひげ図
df.isnull() #Check the column with null
df.isnull().any() #Check if there is null for the column
df.isnull().sum() #Count the number of nulls for a column
df["Column name"].value_counts() #Output the number of numbers
df.fillna() #Convert all null values to concrete numbers
df.dropna(subset=["Column name"]) #If there is null for the column, delete the corresponding row
df[["Column name 1","Column name 2"]].corr() #Output the correlation between two columns
df.plot.scatter(x="Column name",y="Column name",figsize=(5,5)) #Plot scatter plot

Recommended Posts

Data analysis python
Data analysis with python 2
Data analysis using Python 0
Data analysis overview python
Python data analysis template
Data analysis with Python
My python data analysis container
Python for Data Analysis Chapter 4
[Python] Notes on data analysis
Python data analysis learning notes
Python for Data Analysis Chapter 2
Data analysis using python pandas
Python for Data Analysis Chapter 3
Python: Time Series Analysis: Preprocessing Time Series Data
Data analysis Titanic 2
Data analysis Titanic 1
Preprocessing template for data analysis (Python)
Data analysis Titanic 3
Data analysis starting with python (data visualization 1)
Data analysis starting with python (data visualization 2)
[python] Read data
Python visualization tool for data analysis work
[Python] First data analysis / machine learning (Kaggle)
Data analysis starting with python (data preprocessing-machine learning)
I did Python data analysis training remotely
Python 3 Engineer Certified Data Analysis Exam Preparation
Python: Time Series Analysis
Data analysis parts collection
Python Data Visualization Libraries
Voice analysis with python
Data cleaning using Python
[Python tutorial] Data structure
[Python] Sorting Numpy data
Association analysis in Python
Voice analysis with python
Regression analysis in Python
[Examination Report] Python 3 Engineer Certified Data Analysis Exam
Python3 Engineer Certification Data Analysis Exam Self-made Questions
Python 3 Engineer Certification Data Analysis Exam Pre-Exam Learning
[Python] Data analysis, machine learning practice (Kaggle) -Data preprocessing-
Data analysis in Python: A note about line_profiler
[Python] Flow from web scraping to data analysis
A well-prepared record of data analysis in Python
Sample data created with python
Handle Ambient data in Python
Multidimensional data analysis library xarray
data structure python push pop
[Python] Morphological analysis with MeCab
Data analysis for improving POG 1 ~ Web scraping with Python ~
Have passed the Python Engineer Certification Data Analysis Exam
[Co-occurrence analysis] Easy co-occurrence analysis with Python! [Python]
[Python] [Word] [python-docx] Simple analysis of diff data using python
Python
Python: Japanese text: Morphological analysis
Display UTM-30LX data in Python
Get Youtube data with python
Sentiment analysis with Python (word2vec)
Reading Note: An Introduction to Data Analysis with Python
Static analysis of Python programs
Data Science Cheat Sheet (Python)
Data analysis environment construction with Python (IPython notebook + Pandas)