[PYTHON] How to unify the bin width when displaying multiple histograms on top of each other (matplotlib)

background

When displaying multiple histograms in an overlapping manner using a for loop, the width differs for each data and it was difficult to compare unless the bin width was specified, so I investigated how to display the bin width in a unified manner.

I'm sorry if it's hard to read because it's a memo for myself.

Import / used dataset

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_wine

wine = load_wine()
df_wine = pd.DataFrame(data=wine.data, columns=wine.feature_names)
df_wine['target'] = wine.target

Use the scikit-learn wine dataset. The target column has a label indicating the type of wine.

Method

If you pass a list to bins, which is an argument ofplt.hist (), a histogram will be drawn with the values specified in the list as interval delimiters. (If bins = [0,1,2,3,4], the bars in the four sections of 0 to 1, 1 to 2, 2 to 3, 3 to 4 are drawn.)

Using this, create a list with np.linspace (minimum value, maximum value, number you want to separate) and pass it as an argument ofplt.hist ()for each label to specify a common bin. To do.

↓ bin width not adjusted

feature_name = 'hue'
target_names = df_wine['target'].unique()

for target in target_names:
    plt.hist(df_wine[df_wine.target == target][feature_name], alpha=0.6, label=target)

plt.title(feature_name)
plt.legend()

↓ bin width adjustment available

feature_name = 'hue'
target_names = df_wine['target'].unique()

#N between the maximum and minimum values_Set to display the histogram bar with bin width divided equally (unify bin width of each target)
n_bin = 15
x_max = df_wine[feature_name].max()
x_min = df_wine[feature_name].min()
bins = np.linspace(x_min, x_max, n_bin)

for target in target_names:
    plt.hist(df_wine[df_wine.target == target][feature_name], bins=bins, alpha=0.6, label=target)

plt.title(feature_name)
plt.legend()

Referenced articles

Adjust the bin width quickly and neatly with the histogram of matplotlib and seaborn --Qiita