[PYTHON] Precautions when drawing the probability density function and the histogram on top of each other in matplotlib

When I wanted to overlay the histogram and the probability density function, I couldn't overlay them neatly at first, so I'll leave a note of the solution at that time.

Cause

Random numbers generated by numpy.random have a histogram area larger than 1 depending on the number of generations.

solution

--Standardized histogram --Align the area of the probability density function with the histogram

Standardize the histogram

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm
n = 2000 #The number of data

data = np.random.randn(n)
plt.hist(data, range=(-3, 3), bins=60, alpha=0.5, density=True)
#Class width 0.1 Total number of data 2000,Histogram with relative frequency density on the vertical axis
x = np.linspace(-3, 3, 61)
#0.Generate a sequence of numbers in 1 increments
plt.plot(x, norm.pdf(x), c='r')

The area was set to 1 by setting the vertical axis to the relative frequency density instead of the frequency.


Class width w,Class n,Frequency D_Original area S with n\\

S=\sum_{n}D_nw\\

Relative frequency is\\D_n^R=\frac{D_n}{\sum_{n}D_n}\\
Relative density frequency\\

\begin{aligned}
D_n^{'}&=\frac{D_n^R}{w}\\
&={\frac{D_n}{\sum_{n}D_n}} \times {\frac{1}{w}}\\
&=\frac{D_n}{S}\\
\end{aligned}\\
Area S of histogram using relative density frequency^{'}\\
\begin{aligned}

S^{'}&=D_n^{'}\\
&=\frac{\sum_{n}D_n}{S}\\
&=\frac{S}{S}\\
&=1
\end{aligned}

Align the area of the probability density function with the histogram

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm
n = 2000 #The number of data

data = np.random.randn(n)
plt.hist(data, range=(-3, 3), bins=60, alpha=0.5)
#Class width 0.1 Total number of data 2000,Histogram with relative frequency density on the vertical axis
x = np.linspace(-3, 3, 61)
#0.Generate a sequence of numbers in 1 increments
plt.plot(x, n*0.1*norm.pdf(x), c='r')
#0.1 is class width

The area was adjusted by multiplying the probability density function by the area of the histogram.

Probability density function f(x),Total number of histogram data n,Let the class width be w.\\
f(x)Satisfies the following equation\\
\begin{aligned}
\int f(x) dx = 1
\end{aligned}\\
N on both sides\Histogram area n by multiplying times w\Can be times w.\\
Therefore, the probability density function f after conversion^{'}(x)Is\\
f^{'}(x)=n\times w \times f(x)





Recommended Posts

Precautions when drawing the probability density function and the histogram on top of each other in matplotlib
How to unify the bin width when displaying multiple histograms on top of each other (matplotlib)
Text mining: Probability density distribution on the hypersphere and text clustering in KMeans
Defeat the probability density function of the normal distribution
When the axis and label overlap in matplotlib
Drawing on Jupyter using the plot function of pandas
In matplotlib, set the vertical axis on the left side of the histogram to frequency and the vertical axis on the right side to relative frequency (maybe a wicked way)
PRML Diagram Drawing Exercise 1.4 Nonlinear Transformation of Probability Density Function
Make the function of drawing Japanese fonts in OpenCV general
A story that I had a hard time displaying graphs on top of each other with matplotlib
Try transcribing the probability mass function of the binomial distribution in Python
Probability of getting the highest and lowest turnip prices in Atsumori
Precautions when using the urllib.parse.quote function
Adjust the bin width crisply and neatly with the histogram of matplotlib and seaborn
Look up the names and data of free variables in function objects
Set the vertical axis of the histogram to relative frequency (total height of columns = 1) and relative frequency density (area of the entire histogram = 1) with matplotlib.
[Python] Precautions when finding the maximum and minimum values in a numpy array with a small number of elements
Precautions when pickling a function in python
Separation of design and data in matplotlib
[Python] Precautions when retrieving data by scraping and putting it in the list
[Linux] Difference in time information depending on the clock ID of the clock_gettime () function
Set an upper limit on the number of recursive function iterations in Python
Differences in the behavior of each LL language when the list index is skipped
Error that occurred in OpenCV3 and its solution Precautions when using OpenCV3 on Mac
Error details and countermeasures that occurred in OpenCv2 when executing the object recognition sample program of "Object Detect on Tools"