A model with only intercepts for linear predictors ($ \ log \ lambda = \ beta_1 $) is too easy. By increasing the number of parameters, like $ \ log \ lambda = \ beta_1 + \ beta_2 x + \ dots + \ beta_6x ^ 6 $ The fit will improve.
The source code for polynomials is written below.
>>> import numpy
>>> import pandas
>>> import matplotlib.pyplot as plt
>>> import statsmodels.api as sm
>>> import statsmodels.formula.api as smf
>>> d = pandas.read_csv("data3a.csv")
>>> model = smf.glm('y ~ x + I(x**2) + I(x**3) + I(x**4) + I(x**5) + I(x**6)', data=d, family=sm.families.Poisson())
>>> result = model.fit()
>>> result.summary()
>>> params = result.params
>>> fit = lambda x: numpy.exp(params[0] + params[1] * x + params[2] * x ** 2 + params[3] * x ** 3 + params[4] * x ** 4 + params[5] * x ** 5 + params[6] * x ** 6)
>>> xx = numpy.linspace(min(d.x), max(d.x), 100)
>>> plt.plot(xx, fit(xx))
>>> plt.plot(d.x[d.f == 'T'], d.y[d.f == 'T'], 'ro')
>>> plt.plot(d.x[d.f == 'C'], d.y[d.f == 'C'], 'bo')
>>> plt.show()
** AIC **: "A model that makes good ** predictions ** is a good model."
So far, four types of models have been applied to seed data.
--Model that has no effect (constant model) --Model affected by the effect of body size (x model) --Model affected by fertilization effect (f model) --Models affected by body size effect and fertilization effect (x + f model)
>>> plt.subplot(221)
>>> model = smf.glm('y ~ 1', data=d, family=sm.families.Poisson())
>>> result = model.fit()
>>> plt.plot(xx, [numpy.exp(result.params[0]) for _ in range(100)])
>>> plt.subplot(222)
>>> model = smf.glm('y ~ f', data=d, family=sm.families.Poisson())
>>> result = model.fit()
>>> plt.plot(xx, [numpy.exp(result.params[0] + result.params[1]) for _ in range(100)], 'r')
>>> plt.plot(xx, [numpy.exp(result.params[0]) for _ in range(100)], 'b')
>>> plt.subplot(223)
>>> model = smf.glm('y ~ x', data=d, family=sm.families.Poisson())
>>> result = model.fit()
>>> fit = lambda x: numpy.exp(result.params[0] + result.params[1] * x)
>>> plt.plot(xx, fit(xx))
>>> plt.subplot(224)
>>> model = smf.glm('y ~ x + f', data=d, family=sm.families.Poisson())
>>> result = model.fit()
>>> fit = lambda x: numpy.exp(result.params[0]+ result.params[2] * x)
>>> plt.plot(xx, fit(xx), 'b')
>>> fit = lambda x: numpy.exp(result.params[0]+ result.params[1] + result.params[2] * x)
>>> plt.plot(xx, fit(xx), 'r')
About the statistic ** deviance ** (** deviance **) $ D $ which is a transformation of the maximum log-likelihood $ \ log L ^ \ ast $, which is a good fit.
D = -2 \log L ^\ast
Is defined as.
The Deviance
in the previousresults.summary ()
is ** residual deviance ** (= deviance of the model-minimum deviance, ** residual deviance **).
Deviance of the full model. A full model is a model with $ \ lambda_i = y_i $. In other words, the maximum log-likelihood is.
\log L = \sum_i \log \frac{y_i ^{y_i} \exp(-y_i}{y_i !}
Considering the x model,
#x Model minimum deviance
>>> dpois = lambda x:sct.poisson.logpmf(x, x)
>>> sum(dpois(d.y))
-192.8897525244958
>>> sum(dpois(d.y)) * (-2)
385.77950504899161
From the above, the residual deviation of this model is
>>> result.llf * (-2) - sum(dpois(d.y))*(-2)
84.992996490729922
>>> result.summary()
Generalized Linear Model Regression Results
==============================================================================
Dep. Variable: y No. Observations: 100
Model: GLM Df Residuals: 98
Model Family: Poisson Df Model: 1
Link Function: log Scale: 1.0
Method: IRLS Log-Likelihood: -235.39
Date:Money, 05 6 2015 Deviance: 84.993
Time: 13:12:50 Pearson chi2: 83.8
No. Iterations: 7
==============================================================================
coef std err z P>|z| [95.0% Conf. Int.]
------------------------------------------------------------------------------
Intercept 1.2917 0.364 3.552 0.000 0.579 2.005
x 0.0757 0.036 2.125 0.034 0.006 0.145
==============================================================================
The maximum deviance, which is the opposite of the minimum deviance, is the deviance of the Null model. The Null model is a model in which the linear predictor is only the intercept.
In summary, the larger the residual deviance, the worse the fit, and the smaller the residual deviation, the better the fit (overfitting?).
** AIC ** (Akaike's information criterion) is a criterion that emphasizes predictability. When the number of parameters estimated for maximum likelihood is $ k $,
\begin{eqnarray}
AIC &=& -2 \{(Maximum log-likelihood) - (Maximum likelihood estimated number of parameters) \} \\
&=& -2(\log L ^ \ast - k) \\
&=& D + 2k
\end{eqnarray}
model | Number of parameters |
deviance( |
residual deviance | AIC | |
---|---|---|---|---|---|
Constant | 1 | -237.6 | 475.3 | 89.5 | 477.3 |
f | 2 | -237.6 | 475.3 | 89.5 | 479.3 |
x | 2 | -235.4 | 470.8 | 85.0 | 474.8 |
x+f | 3 | -235.3 | 470.6 | 84.8 | 476.6 |
full | 100 | -192.9 | 385.6 | 0.0 | 585.8 |
From the table, the x model is the AIC's smallest statistical model.
In general, increasing the explanatory variables (number of parameters) improves the accuracy of the training data, but makes the prediction worse (the bias difference between the nested models is constant). → If you choose something that can be explained well, the bias will not change, but the whole will improve → AIC will decrease
The difference between the maximum log-likelihood $ \ log L ^ \ ast $ and the average log-likelihood is from the constant model (k = 1) if the improvement of the former by increasing the number of parameters (k = 1-> 2) is greater than 1. AIC becomes smaller. I don't understand this Japanese ...
Recommended Posts