A memo when practicing visualization with pandas and seaborn using ʻiris.csv` as a sample data set. Since it is a memo for myself, I think that there are arbitrary parts such as the type of figure and how to select columns, but please understand _ (._.) _
data: https://raw.githubusercontent.com/uiuc-cse/data-fa14/gh-pages/data/iris.csv
ʻIris.csvhas 4 columns and 1 category value It consists of
sepal_length,
sepal_width,
petal_length,
peta_widthand
species. Visualize with the classification of the category value
species` in mind.
First, check the distribution of one column.
sepal_length
hist_iris1.py
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.read_csv("iris.csv")
#df = sns.load_dataset("iris") #Iris at hand.Without csv
sns.distplot(df.sepal_length,kde = True)
plt.show()
Next, the distribution of the four columns was drawn on four separate graphs. I thought it would be convenient to specify layout = (2,2)
using the plot ()
method of DataFrame
and output 4 graphs in a 2 * 2 square layout, but with a histogram I don't know how to display the density function by kernel density estimation at the same time.
sepal_length
, sepal_width
, petal_length
, peta_width
hist_iris2.py
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.read_csv("iris.csv")
#df = sns.load_dataset("iris") #Iris at hand.Without csv
df.plot(kind="kde",subplots=True,layout=(2,2)) #kind="hist"Histogram with
plt.show()
sepal_length
by categoryCheck how the distribution of sepal_length
differs between setosa
and versicolor
.
hist_iris.py
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.read_csv("iris.csv")
#df = sns.load_dataset("iris") #Iris at hand.Without csv
sns.distplot(df[df["species"]=="setosa"].sepal_length,kde=True,rug=True)
sns.distplot(df[df["species"]=="versicolor"].sepal_length,kde=True,rug=True)
plt.show()
Scatterplot matrices are a useful visualization method for overviewing the data (I think). In Seaborn, you can easily draw using pairplot ()
.
In the following example, hue =" species "
is set as an argument of pairplot ()
. This will color-code the iris dataset by type of category value " species "
. If diag_kind =" kde "
is set, the density function by kernel density estimation is drawn for the diagonal component. If nothing is specified, the histogram is simply displayed.
sepal_length
by categoryhist_iris.py
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv("iris.csv")
#df = sns.load_dataset("iris") #Iris at hand.Without csv
#pairplot:Draw a scatterplot matrix
g = sns.pairplot(df,hue = "species",diag_kind="kde")
plt.show()
Recommended Posts