In Linear Regression, which has already appeared several times, the theoretical formula of the line that fits the distribution of the data was obtained by the least squares method. Normal distribution is assumed in many of the various analyzes that have appeared so far, such as linear regression (http://qiita.com/ynakayama/items/e41f592ad7fe02f23c1c).
The least squares method, which can be said to be an indispensable method for fitting, allows various information to be obtained from the constants contained in the theoretical formula by fitting the plotted data to the theoretical formula. For example, it is used in various situations such as when you want to find the slope of a straight line that fits each point, or when you want to find the statistics of a distribution that assumes a normal distribution.
** normal distribution **, also known as ** Gaussian distribution **, is a probability distribution for continuous variables that represents the distribution of data with peaks accumulating near the mean. I also explained in Past Articles. Finding the function of the approximate curve (fitting curve) for the normal distribution is called Gaussian fitting. As usual, Gaussian fitting can be easily achieved by making full use of the powerful mathematical functions of SciPy.
First, find a sample that approximates the normal distribution. Generate 500 data with an average of 100 as follows:
from scipy.stats import norm
import numpy as np
import matplotlib.pyplot as plt
import pylab as plb
#Get a sample that approximates a normal distribution
#The mean is 100, the standard deviation is 1, and the number of samples is 500.
sample = norm.rvs(loc=100,scale=1,size=500)
print (sample) # =>
#[ 101.02975418 99.95689958 100.8338816 99.32725219 101.50090014
# 99.29039034 101.64895275 100.45222206 100.22525394 98.8036744
# 100.73576941 99.32705948 100.52278215 102.38383015 98.28409264
# 99.22632512 100.84625978 99.69653993 100.9957202 97.97846995
# 99.49731157 100.89595798 101.3705089 101.15367469 100.26415751
# 99.14143516 100.21385338 99.69883406 99.68494407 100.70380005
# 100.73544699 100.3434308 99.50291518 99.61483734 100.92201666
# 100.98639356 100.36362462 98.39298021 98.39137284 101.54821395
# 100.2748115 100.78672853 99.79335862 98.8123562 100.57942641
# 100.03497218 99.98368219 100.45979578 99.32342998 98.08908529
# ...
Fitting requires only one method.
param = norm.fit(sample)
print (param)
# => (99.92158820017579, 1.0339291481971331)
Now that you have the parameters, you can plot them.
x = np.linspace(95,105,100)
pdf_fitted = norm.pdf(x,loc=param[0], scale=param[1])
pdf = norm.pdf(x)
plt.figure
plt.title('Normal distribution')
plt.plot(x, pdf_fitted, 'r-', x,pdf, 'b-')
plt.hist(sample, normed=1, alpha=.3)
plt.show()
plt.savefig("image.png ")
It went well.
There is no time to list examples of analyzing engineering problems using the least squares method. In addition, it is important to have a good understanding of probability distributions and tests, as the data analysis process will be ruined if the assumptions fluctuate in a model that assumes a normal distribution.