Normal distribution and its fitting

In Linear Regression, which has already appeared several times, the theoretical formula of the line that fits the distribution of the data was obtained by the least squares method. Normal distribution is assumed in many of the various analyzes that have appeared so far, such as linear regression (http://qiita.com/ynakayama/items/e41f592ad7fe02f23c1c).

The least squares method, which can be said to be an indispensable method for fitting, allows various information to be obtained from the constants contained in the theoretical formula by fitting the plotted data to the theoretical formula. For example, it is used in various situations such as when you want to find the slope of a straight line that fits each point, or when you want to find the statistics of a distribution that assumes a normal distribution.

** normal distribution **, also known as ** Gaussian distribution **, is a probability distribution for continuous variables that represents the distribution of data with peaks accumulating near the mean. I also explained in Past Articles. Finding the function of the approximate curve (fitting curve) for the normal distribution is called Gaussian fitting. As usual, Gaussian fitting can be easily achieved by making full use of the powerful mathematical functions of SciPy.

Get a sample that approximates a normal distribution

First, find a sample that approximates the normal distribution. Generate 500 data with an average of 100 as follows:

from scipy.stats import norm
import numpy as np
import matplotlib.pyplot as plt
import pylab as plb

#Get a sample that approximates a normal distribution
#The mean is 100, the standard deviation is 1, and the number of samples is 500.
sample = norm.rvs(loc=100,scale=1,size=500)

print (sample) # =>
#[ 101.02975418   99.95689958  100.8338816    99.32725219  101.50090014
#   99.29039034  101.64895275  100.45222206  100.22525394   98.8036744
#  100.73576941   99.32705948  100.52278215  102.38383015   98.28409264
#   99.22632512  100.84625978   99.69653993  100.9957202    97.97846995
#   99.49731157  100.89595798  101.3705089   101.15367469  100.26415751
#   99.14143516  100.21385338   99.69883406   99.68494407  100.70380005
#  100.73544699  100.3434308    99.50291518   99.61483734  100.92201666
#  100.98639356  100.36362462   98.39298021   98.39137284  101.54821395
#  100.2748115   100.78672853   99.79335862   98.8123562   100.57942641
#  100.03497218   99.98368219  100.45979578   99.32342998   98.08908529
#  ...

Fitting and visualization

Fitting requires only one method.

param = norm.fit(sample)

print (param)
# => (99.92158820017579, 1.0339291481971331)

Now that you have the parameters, you can plot them.

x = np.linspace(95,105,100)
pdf_fitted = norm.pdf(x,loc=param[0], scale=param[1])
pdf = norm.pdf(x)
plt.figure
plt.title('Normal distribution')
plt.plot(x, pdf_fitted, 'r-', x,pdf, 'b-')
plt.hist(sample, normed=1, alpha=.3)
plt.show()
plt.savefig("image.png ")

It went well.

There is no time to list examples of analyzing engineering problems using the least squares method. In addition, it is important to have a good understanding of probability distributions and tests, as the data analysis process will be ruined if the assumptions fluctuate in a model that assumes a normal distribution.

[PYTHON] Gaussian fitting

Normal distribution and its fitting

Get a sample that approximates a normal distribution

Fitting and visualization