[PYTHON] Set the vertical axis of the histogram to relative frequency (total height of columns = 1) and relative frequency density (area of the entire histogram = 1) with matplotlib.

What to do in this article

In matplotlib, the vertical axis of the histogram

--Frequency (default of matplotlib) --Relative frequency --Relative frequency density

And draw.

Reference page (Thank you)

normed of matplotlib.hist is strange behavior of matplotlib: histogram normed Statistics (2) Use python to learn the probability density function (normal distribution, standard normal distribution)!

Arrangement of terms

The formula is

--Frequency density = frequency / class width --Relative frequency density = Relative frequency / class width

It seems. When I experimented with the Python code below, it seems that the vertical axis is the relative frequency density when "density = True" is specified in the hist function. The hist function also has a normed option (deprecated), but it seems that the density option removes this bug.

Source code (Jupyter Notebook) and drawing results

Draw three histograms. The number of data is 10,000, and it is a normal random number with an average value of 50 and a standard deviation of 10.


#%% md

#%%

import numpy as np
from scipy.stats import norm
import matplotlib.pyplot as plt


#%%

#Data creation
μ = 50
σ = 10
data = [ np.random.normal(μ, σ) for i in range(10000) ]


#%%

#Number of classes
num_bins = 20

#Class width
bin_width = (max(data) - min(data)) / num_bins
print(f"Class width=about{bin_width}")

#Graph drawing
fig = plt.figure(figsize=(8, 24))

# (1)Histogram with frequency on the vertical axis
ax1 = fig.add_subplot(311)
ax1.title.set_text("(1) frequency")
ax1.grid(True)
ax1.hist(data, bins=num_bins)

# (2)Histogram with relative frequency on the vertical axis
ax2 = fig.add_subplot(312)
ax2.title.set_text("(2) relative frequency")
ax2.grid(True)
ax2.set_xlim(ax1.get_xlim())
weights = np.ones_like(data) / len(data)
ax2.hist(data, bins=num_bins, weights=weights)

# (3)Histogram with relative frequency density on the vertical axis(Blue)& Normal distribution probability density function(Red)
ax3 = fig.add_subplot(313)
ax3.title.set_text("(3) density")
ax3.grid(True)
ax3.set_xlim(ax1.get_xlim())
ax3.hist(data, bins=num_bins, density=True, color="blue", alpha=0.5)

x = np.arange(0, 100, 1)
y = norm.pdf(x, μ, σ)
ax3.fill_between(x, y, color="red", alpha=0.5)
ax3.plot(x, y, 'k', linewidth=3, color="red", alpha=0.5)

When I ran the code, it said "class width = about 3.718313197105561" and the following histogram was drawn. Each of the three histograms has (1) vertical axis = frequency, (2) vertical axis = relative frequency, and (3) vertical axis = relative frequency density histogram (blue) with normal distribution probability density function (red) superimposed. Is displayed. Since the histogram of "density = True" and the probability density function overlap, it seems that the area of the entire histogram becomes 1 when "density = True" is set.

1.png

Roughly verify from the calculation formula whether the vertical axis is the relative frequency density with "density = True"

Use the highest column of the histogram for validation.

Relative frequency

The highest pillar in (2) is a value between 0.14 and 0.15.

Class width

When I ran the code, it said about 3.7.

Relative frequency density

Relative frequency / class width = 0.145 / 3.7 ≒ 0.039 It looks like it matches the highest pillar in (3). (End of verification)

Verify that the total height of the columns is 1.0 with the histogram of relative frequency

It is difficult to read the above histogram and add up, so set the class number (num_bins variable) of the Python code to 1 and re-execute.

2.png

The height of the only pillar in (2) is now 1.0. Also, since we used 10000 data, the height of the only pillar in (1) is also 10000. The histogram (blue) of "density = True" in (3) also looks the same as the area of the probability density function (red) (area = 1). (End of verification)

Recommended Posts

Set the vertical axis of the histogram to relative frequency (total height of columns = 1) and relative frequency density (area of the entire histogram = 1) with matplotlib.
In matplotlib, set the vertical axis on the left side of the histogram to frequency and the vertical axis on the right side to relative frequency (maybe a wicked way)
Adjust the bin width crisply and neatly with the histogram of matplotlib and seaborn
Add information to the bottom of the figure with Matplotlib
The vertical and horizontal axes of the matplotlib histogram are unpleasant, so make it feel good
Precautions when drawing the probability density function and the histogram on top of each other in matplotlib
To improve the reusability and maintainability of workflows created with Luigi