I'm neither a software engineer nor a data scientist, but as I extract data from MySQL, BigQuery, etc. in my daily work, I've become interested in the method of statistically expressing and visualizing them. I think jupyter notebook is suitable for realizing them, and recently wrote Python in jupyter notebook. This article is a compilation of memorandums when I use jupyter, and the scope is from reading CSV data using pandas to checking basic statistics and simple data visualization methods.
What is Pandas? ... A library that provides functions to support data analysis. In particular, it provides data structures and operations for manipulating mathematical tables and time series data. What is numpy? ・ ・ ・ Library for numerical analysis What is pyplot? ・ ・ ・ Visualization library
test.ipynb
# 1.Import the libraries needed for data analysis
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
%matplotlib inline
#2.Read data * It is assumed that there is a column. If not header=Specify None as an option
# head()、tail()By using the function, you can check the contents of the data at the beginning or the end. I use it when there are many lines
data = pd.read_csv("hogehoge.csv")
#3.Check the matrix of data
data.shape # -> (Rows, columns) will be returned.
#4.Check the basic statistics (basic characteristics of the data. Mean, standard deviation, maximum, minimum, etc.) and data type.
#Also, mean()By using functions such as, it is also possible to return only the average.
data.describe()
data.info()
#To specify a column: "Hogehoge" in the where clause in SQL=Image to specify "hoge"
data["hoge"]
test.ipynb
#Draw a line graph.
#The size can be specified as an option. figsize=(Horizontal size,Vertical size)Specify with
#In addition, the title can be specified as an option. title= "hoge"
data["hoge"].plot()
#Specifies the name of the x-axis and y-axis
label = date.plot(figzize=(15,5),title="test")
label.set_xlabel("hogehoge")
label.set_ylabel("hogehoge")
#variable.plot.hist()でヒストグラムを、variable.boxplot(by=x axis)Box plot is also possible with.
Recommended Posts