Data set visualization

A memo when practicing visualization with pandas and seaborn using ʻiris.csv` as a sample data set. Since it is a memo for myself, I think that there are arbitrary parts such as the type of figure and how to select columns, but please understand _ (._.) _

data: https://raw.githubusercontent.com/uiuc-cse/data-fa14/gh-pages/data/iris.csv

Histogram drawing

ʻIris.csvhas 4 columns and 1 category value It consists ofsepal_length, sepal_width, petal_length, peta_widthandspecies. Visualize with the classification of the category value species` in mind.

First, check the distribution of one column.

・ Distribution of `sepal_length`

`hist_iris1.py`


import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

df = pd.read_csv("iris.csv")
#df = sns.load_dataset("iris")  #Iris at hand.Without csv

sns.distplot(df.sepal_length,kde = True)
plt.show()

Next, the distribution of the four columns was drawn on four separate graphs. I thought it would be convenient to specify layout = (2,2) using the plot () method of DataFrame and output 4 graphs in a 2 * 2 square layout, but with a histogram I don't know how to display the density function by kernel density estimation at the same time.

・ Distribution of `sepal_length`, `sepal_width`, `petal_length`, `peta_width`

`hist_iris2.py`


import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

df = pd.read_csv("iris.csv")
#df = sns.load_dataset("iris")  #Iris at hand.Without csv

df.plot(kind="kde",subplots=True,layout=(2,2))    #kind="hist"Histogram with
plt.show()

・ Distribution of `sepal_length` by category

Check how the distribution of sepal_length differs between setosa and versicolor.

`hist_iris.py`


import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

df = pd.read_csv("iris.csv")
#df = sns.load_dataset("iris")  #Iris at hand.Without csv

sns.distplot(df[df["species"]=="setosa"].sepal_length,kde=True,rug=True)
sns.distplot(df[df["species"]=="versicolor"].sepal_length,kde=True,rug=True)
plt.show()

Drawing a scatterplot matrix

Scatterplot matrices are a useful visualization method for overviewing the data (I think). In Seaborn, you can easily draw using pairplot (). In the following example, hue =" species " is set as an argument of pairplot (). This will color-code the iris dataset by type of category value " species ". If diag_kind =" kde " is set, the density function by kernel density estimation is drawn for the diagonal component. If nothing is specified, the histogram is simply displayed.

・ Distribution of `sepal_length` by category

`hist_iris.py`


import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv("iris.csv")
#df = sns.load_dataset("iris")  #Iris at hand.Without csv

#pairplot:Draw a scatterplot matrix
g = sns.pairplot(df,hue = "species",diag_kind="kde")
plt.show()

[PYTHON] Visualization memo by pandas, seaborn

Data set visualization

Histogram drawing

・ Distribution of sepal_length

hist_iris1.py

・ Distribution of sepal_length, sepal_width, petal_length, peta_width

hist_iris2.py

・ Distribution of sepal_length by category

hist_iris.py

Drawing a scatterplot matrix

・ Distribution of sepal_length by category

hist_iris.py

・ Distribution of `sepal_length`

`hist_iris1.py`

・ Distribution of `sepal_length`, `sepal_width`, `petal_length`, `peta_width`

`hist_iris2.py`

・ Distribution of `sepal_length` by category

`hist_iris.py`

・ Distribution of `sepal_length` by category

`hist_iris.py`