[PYTHON] Visualization memo by pandas, seaborn

Data set visualization

A memo when practicing visualization with pandas and seaborn using ʻiris.csv` as a sample data set. Since it is a memo for myself, I think that there are arbitrary parts such as the type of figure and how to select columns, but please understand _ (._.) _

data: https://raw.githubusercontent.com/uiuc-cse/data-fa14/gh-pages/data/iris.csv

Histogram drawing

ʻIris.csvhas 4 columns and 1 category value It consists ofsepal_length, sepal_width, petal_length, peta_widthandspecies. Visualize with the classification of the category value species` in mind.

qiita_iris.jpg


First, check the distribution of one column.

・ Distribution of sepal_length

hist_iris1.py


import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

df = pd.read_csv("iris.csv")
#df = sns.load_dataset("iris")  #Iris at hand.Without csv

sns.distplot(df.sepal_length,kde = True)
plt.show()

qiita_1.png


Next, the distribution of the four columns was drawn on four separate graphs. I thought it would be convenient to specify layout = (2,2) using the plot () method of DataFrame and output 4 graphs in a 2 * 2 square layout, but with a histogram I don't know how to display the density function by kernel density estimation at the same time.

・ Distribution of sepal_length, sepal_width, petal_length, peta_width

hist_iris2.py


import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

df = pd.read_csv("iris.csv")
#df = sns.load_dataset("iris")  #Iris at hand.Without csv

df.plot(kind="kde",subplots=True,layout=(2,2))    #kind="hist"Histogram with
plt.show()

qiita_4.png


・ Distribution of sepal_length by category

Check how the distribution of sepal_length differs between setosa and versicolor.

hist_iris.py


import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

df = pd.read_csv("iris.csv")
#df = sns.load_dataset("iris")  #Iris at hand.Without csv

sns.distplot(df[df["species"]=="setosa"].sepal_length,kde=True,rug=True)
sns.distplot(df[df["species"]=="versicolor"].sepal_length,kde=True,rug=True)
plt.show()

qiita_2.png


Drawing a scatterplot matrix

Scatterplot matrices are a useful visualization method for overviewing the data (I think). In Seaborn, you can easily draw using pairplot (). In the following example, hue =" species " is set as an argument of pairplot (). This will color-code the iris dataset by type of category value " species ". If diag_kind =" kde " is set, the density function by kernel density estimation is drawn for the diagonal component. If nothing is specified, the histogram is simply displayed.

・ Distribution of sepal_length by category

hist_iris.py


import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv("iris.csv")
#df = sns.load_dataset("iris")  #Iris at hand.Without csv

#pairplot:Draw a scatterplot matrix
g = sns.pairplot(df,hue = "species",diag_kind="kde")
plt.show()

qiita_3.png

Recommended Posts

Visualization memo by pandas, seaborn
Visualization memo by Python
Pandas memo
pandas memo
Sort by pandas
100 language processing knock-99 (using pandas): visualization by t-SNE
Pandas reverse lookup memo
Data visualization with pandas
Analysis of financial data by pandas and its visualization (1)
pandas Matplotlib Summary by usage
Memorandum (pseudo Vlookup by pandas)
Visualization of data by prefecture
LightGBM/XGBoost tree structure visualization memo
Standardize by group with pandas
Visualization of matrix created by numpy
[Python] Operation memo of pandas DataFrame
Data visualization method using matplotlib (+ pandas) (5)
Pandas memo ~ None, np.nan, empty string ~
[Memo] Small story of pandas, numpy
Data visualization method using matplotlib (+ pandas) (3)
Manipulating strings with pandas group by
Interval scheduling learning memo ~ by python ~
Easy data visualization with Python seaborn.
Data visualization method using matplotlib (+ pandas) (4)
Feature generation with pandas group by
Graph the ratio of topcoder, Codeforces and TOEIC by rating (Pandas + seaborn)