There are several libraries that visualize data in Python, but Pandas alone is pretty good. Visualization with Pandas can be completed in a method chain, which can slightly prevent the clutter of temporary variables. In this article, I will introduce visualization recipes, focusing on the ones that I often use in practice.

Preparation

environment

Python 3.6.8
jupyter==1.0.0
pandas==0.25.3

data

This time, I will borrow the following two data.

Make the DataFrames titanic and crime respectively.

import pandas as pd
import zipfile

with zipfile.ZipFile('titanic.zip') as myzip:
    with myzip.open('train.csv') as myfile:
        titanic = pd.read_csv(myfile)

with zipfile.ZipFile('crimes-in-boston.zip') as myzip:
    with myzip.open('crime.csv') as myfile:
        crime = pd.read_csv(myfile, encoding='latin-1', parse_dates=['OCCURRED_ON_DATE'])

Visualization recipe

Histogram

This is the quickest way to see the distribution of numerical data. Bar charts may be more appropriate when there are few unique values.

titanic['Age'].plot.hist()

Box Plot

Used when looking at quartiles. Points outside the box length x 1.5 are marked as outliers. Violin plots cannot be drawn with Pandas, so give up and use Seaborn.

titanic['Age'].plot.box()

Kernel Density Optimization

It is a method to estimate PDF from data, but if it is one-dimensional, a histogram may be enough. For more information on Python kernel density estimation, see here. Since it uses scipy, if it is not installed, install it with pip install scipy.

titanic['Age'].plot.kde()

Scatter Plot

It is used to see the relationship between real numbers. If the points overlap too much, the density will not be known, so I think it is standard to make it transparent. If either one is a category or has few unique values, it is better to use the grouped histograms and boxplots described below.

titanic.plot.scatter(x='Age', y='Fare', alpha=0.3)

Hexagonal Binning Plot

I have never used it, but I will introduce it for the time being.

titanic.plot.hexbin(x='Age', y='Fare', gridsize=30)

Bar Plot

It is often used to see aggregated values for each category.

titanic['Embarked'].value_counts(dropna=False).plot.bar()

Horizontal Bar Plot

I tried to lie down.

titanic['Embarked'].value_counts(dropna=False).plot.barh()

Horizontal Bar Plot with DataFrame Styling

You can make the DataFrame look like a bar graph. I use it a lot because it allows me to search by text.

titanic['Embarked'].value_counts(dropna=False).to_frame().style.bar(vmin=0)

Line Plot

It is often used to see changes in the series.

crime['OCCURRED_ON_DATE'].dt.date.value_counts().plot.line(figsize=(16, 4))

Area Plot

As with the line graph, we see the changes in the series, but we see the magnitude from zero. However, if it is too fine, it will be difficult to see the valley, so it is better to discretize it a little.

crime['OCCURRED_ON_DATE'].dt.date.value_counts().plot.area(figsize=(16, 4), linewidth=0)

Pie Plot

I don't use pie charts because they are difficult to understand, but I will introduce them for the time being. The reasons why pie charts are difficult to understand are summarized in the following article.

-Do you still use pie charts? --Data Visualization Ideabook

titanic['Embarked'].value_counts(dropna=False).plot.pie()

Grouped Histogram

Often used to compare the distribution between two groups. (It doesn't have to be 2 groups)

titanic.groupby('Survived')['Age'].plot.hist(alpha=0.5, legend=True)

titanic['Age'].groupby(titanic['Survived']).plot.hist(alpha=0.5, legend=True)

So, in the latter case, you can use an external Series.

Grouped Box Plot

It doesn't work with groupby, so write as follows.

titanic.boxplot(column='Age', by='Survived')

Grouped Kernel Density Estimation

It may be used to compare the distribution between two groups as well as the histogram.

titanic['Age'].groupby(titanic['Survived']).plot.kde(legend=True)

Grouped Scatter Plot

I think I use it often, but I can't write smartly. If it is group by, it will be returned as a list.

titanic.groupby('Survived').plot.scatter(x='Age', y='Fare', alpha=0.3)

It cannot be used unless the key is numerical data, but if you write it as follows, it will be a scatter plot of different colors for each group.

titanic.plot.scatter(x='Age', y='Fare', c='Survived', cmap='viridis', alpha=0.3)

Pandas Official Documentation shows how to share Axis and draw two graphs. ..

ax = titanic[titanic['Survived'] == 0].plot.scatter(x='Age', y='Fare', label=0, alpha=0.3)
titanic[titanic['Survived'] == 1].plot.scatter(x='Age', y='Fare', c='tab:orange', label=1, alpha=0.3, ax=ax)

Grouped Hexagonal Binning Plot

titanic.groupby('Survived').plot.hexbin(x='Age', y='Fare', gridsize=30)

Grouped Bar Plot

titanic['Embarked'].groupby(titanic['Survived']).value_counts(dropna=False).unstack().plot.bar()

titanic['Embarked'].groupby(titanic['Survived']).value_counts(dropna=False).unstack(0).plot.bar()

Grouped Horizontal Bar Plot

titanic['Embarked'].groupby(titanic['Survived']).value_counts(dropna=False).unstack().plot.barh()

titanic['Embarked'].groupby(titanic['Survived']).value_counts(dropna=False).unstack(0).plot.barh()

Grouped Horizontal Bar Plot with DataFrame Styling

titanic['Embarked'].groupby(titanic['Survived']).value_counts(dropna=False).unstack(0).style.bar(vmin=0, axis=None)

Grouped Line Plot

crime['OCCURRED_ON_DATE'].dt.date.groupby(crime['DISTRICT']).value_counts().unstack(0).plot.line(figsize=(16, 4), alpha=0.5)

crime['OCCURRED_ON_DATE'].dt.date.groupby(crime['DISTRICT']).value_counts().unstack(0).iloc[:, :4].plot.line(figsize=(16, 4), alpha=0.5)

Stacked Area Plot

crime['OCCURRED_ON_DATE'].dt.date.groupby(crime['DISTRICT']).value_counts().unstack(0).plot.area(figsize=(16, 4), linewidth=0)

crime['OCCURRED_ON_DATE'].dt.date.groupby(crime['DISTRICT']).value_counts().unstack(0).iloc[:, :4].plot.area(figsize=(16, 4), linewidth=0)

Grouped Pie Plot

titanic['Embarked'].groupby(titanic['Survived']).value_counts(dropna=False).unstack(0).plot.pie(subplots=True)

Stacked Bar Plot

titanic['Embarked'].groupby(titanic['Survived']).value_counts(dropna=False).unstack().plot.bar(stacked=True)

titanic['Embarked'].groupby(titanic['Survived']).value_counts(dropna=False).unstack(0).plot.bar(stacked=True)

Stacked Horizontal Bar Plot

titanic['Embarked'].groupby(titanic['Survived']).value_counts(dropna=False).unstack().plot.barh(stacked=True)

titanic['Embarked'].groupby(titanic['Survived']).value_counts(dropna=False).unstack(0).plot.barh(stacked=True)

Percent Stacked Bar Plot

To draw a 100% stacked bar chart, you have to calculate the percentage.

(titanic['Embarked'].groupby(titanic['Survived']).value_counts(dropna=False).unstack()
 .div(titanic['Survived'].value_counts(dropna=False), axis=0)
 .plot.bar(stacked=True))

(titanic['Embarked'].groupby(titanic['Survived']).value_counts(dropna=False).unstack(0)
 .div(titanic['Embarked'].value_counts(dropna=False), axis=0)
 .plot.bar(stacked=True))

Percent Stacked Horizontal Bar Plot

(titanic['Embarked'].groupby(titanic['Survived']).value_counts(dropna=False).unstack()
 .div(titanic['Survived'].value_counts(dropna=False), axis=0)
 .plot.barh(stacked=True))

(titanic['Embarked'].groupby(titanic['Survived']).value_counts(dropna=False).unstack(0)
 .div(titanic['Embarked'].value_counts(dropna=False), axis=0)
 .plot.barh(stacked=True))

Overlay Plots

Overlay the histogram and the kernel density estimation graph.

titanic['Age'].groupby(titanic['Survived']).plot.hist(alpha=0.5, legend=True)
titanic['Age'].groupby(titanic['Survived']).plot.kde(legend=True, secondary_y=True)

Grouped Bar Plot with Error Bars

You have to calculate the standard error to draw the error bar.

yerr = titanic.groupby(['Survived', 'Pclass'])['Fare'].std().unstack(0)
titanic.groupby(['Survived', 'Pclass'])['Fare'].mean().unstack(0).plot.bar(yerr=yerr)

Heat Map with DataFrame Styling

(pd.crosstab(crime['DAY_OF_WEEK'], crime['HOUR'].div(3).map(int).mul(3), normalize=True)
 .reindex(['Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday'])
 .style.background_gradient(axis=None).format('{:.3%}'))

If you change the color map, it will look like a lawn.

(pd.crosstab(crime['DAY_OF_WEEK'], crime['MONTH'], normalize=True)
 .reindex(['Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday'])
 .style.background_gradient(axis=None, cmap='YlGn').format('{:.3%}'))

Correlation Heat Map with DataFrame Styling

I will introduce it in the next article.

-[One line] Heatmap the correlation matrix with Pandas only

corr = titanic.corr()
low = (1 + corr.values.min()) / (1 - corr.values.min())
corr.style.background_gradient(axis=None, cmap='viridis', low=low).format('{:.6f}')

the end

I introduced the ones that seem to be relatively easy to use. There is also such a thing! Please let me know. If you want to draw a more elaborate graph, the next page will be helpful.

Visualization — pandas documentation
Styling — pandas documentation -Python pandas mastering the plot function --StatsFragments -[Explanation of all arguments of Pandas plot | Self-consideration journey](https://own-search-and-study.xyz/2016/08/03/pandas%E3%81%AEplot%E3%81% AE% E5% 85% A8% E5% BC% 95% E6% 95% B0% E3% 82% 92% E4% BD% BF% E3% 81% 84% E3% 81% 93% E3% 81% AA% E3% 81% 99 /)

[PYTHON] Quickly visualize with Pandas