Perform a test to compare the estimated statistical models. This chapter deals with the ** likelihood ratio test **.

Statistical test framework

procedure

Determine the data to use
Design an appropriate statistical model corresponding to the purpose and data structure, and perform maximum likelihood estimation of the parameters.

In the test, a model with few parameters and a model with many parameters (simple model and complex model) are tested. They are called ** null hypothesis ** and ** alternative hypothesis **.

Neyman-Pearson test framework

The statistical model test examines whether the proposition "the null hypothesis is correct" can be denied.

Treat the goodness of the model as ** test statistic **
Assume that the null hypothesis is a "true model"
Examine the variability (probability distribution) of the test statistic and determine the "probable range" of the test statistic value
When the size of the "common range" is 95%, it can be said that a significance level of 5% is set.
If the test statistic obtained by the alternative hypothesis model is out of the "common range", reject the null hypothesis and support the alternative hypothesis.

Likelihood ratio test example: Examine the difference in deviance

Statistical model to use:

-** Constant model : A model in which the average number of seeds $ \ lambda_i $ is constant and does not depend on body size $ x_i $ (slope $ \ beta_2 = 0 $; number of parameters $ k = 1 $) - x model **: A model in which the average number of seeds $ \ lambda_i $ depends on the body size $ x_i $ (slope $ \ beta_2 \ neq 0 $; number of parameters $ k = 2 $)

model	Number of parametersk	\log L ^ \ast	deviance（-2\log L ^ \ast）	residual deviance	AIC
Constant	1	-237.6	475.3	89.5	477.3
x	2	-235.4	470.8	85.0	474.8
full	100	-192.9	385.6	0.0	585.8

The likelihood ratio test uses the difference in deviance, which is the logarithm of the likelihood ratio multiplied by -2. In the case of this example,

\begin{eqnarray}
\frac{L_1 ^\ast}{L_2 ^\ast} &=& \frac{Maximum likelihood of a constant model:\exp(-237.6)}{Maximum likelihood of x model:\exp(-235.4)} \\
\Delta D_{1,2} &=& -2 * (\log L_1 ^ \ast - \log L_2 ^ \ast) \\
\Delta D_{1,2} &=& D_1 - D_2 \\
\Delta D_{1,2} &=& 4.5
\end{eqnarray}

It is evaluated that this deviance is improved by 4.5.

--Null hypothesis: constant model (number of parameters $ k = 1, \ beta_2 = 0 ) --Alternative hypothesis: x model ( k = 2, \ beta_2 \ neq 0 $)

The null hypothesis is	\Delta D_{1,2}Rarely difference (rejection)	\Delta D_{1,2}Is a common difference (cannot be rejected)
Is a true model	Type I error	(no problem)
Not a true model	(no problem)	Type II error

In the Neyman-Pearson test framework devoted solely to the examination of type I errors

As a procedure

Assume that a constant model that is a null hypothesis is correct (a true model)
When a certain model is applied to the observed data, it becomes $ \ hat {\ beta_1} = 2.06 $, so this is considered to be the same as the true model.
Create data from a true model and apply the x model each time to calculate $ \ Delta D_ {1, 2} $ and find the distribution of $ \ Delta D_ {1, 2} $
The probability P that the difference between the deviances of the constant model and the x model is $ \ Delta D_ {1,2} \ geq 4.5 $ is obtained.
If $ \ Delta D_ {1,2} = 4.5 $ is considered an impossible value, the null hypothesis is rejected and the alternative hypothesis is automatically adopted.

The probability $ P $ that the difference between the deviances of the constant model and the x model is $ \ Delta D_ {1,2} \ geq 4.5 $ is called the ** P value **.

--P value is "large": $ \ Delta D_ {1,2} = 4.5 $ is common → Null hypothesis cannot be rejected --P value is "small": $ \ Delta D_ {1,2} = 4.5 $ is very rare → Let's reject the null hypothesis, and the remaining x model is correct

Determine the ** significance level ** in advance (by yourself)

-$ P \ geq \ alpha : The null hypothesis cannot be rejected - P <\ alpha $: Null hypothesis can be rejected

Let.

Null hypothesis: Calculates the probability that $ \ Delta D_ {1,2}, which is a test statistic in the world where a constant model is a true model, will be greater than 4.5 (make a type I error).

Parametric bootstrap method

Above 3. A method in which the process of generating the data of is performed based on a simulation using random numbers. Suppose that the result of the constant model is stored in fit1 and the result of the x model is stored in fit2.

>>> model1 = smf.glm('y~1', data=d, family=sm.families.Poisson())
>>> model2 = smf.glm('y~x', data=d, family=sm.families.Poisson())
>>> fit1 = model1.fit()
>>> fit2 = model2.fit()

# fit1$deviance - fit2$deviance
>>> fit1.deviance - fit2.deviance
4.5139410788540459

From the average number of seeds estimated by a certain model $ \ lambda = \ exp (2.06) = 7.85 $ The data to be generated is "** 100 Poisson random numbers with an average of 7.85 **".

# d$y.rnd <- rpois(100, lambda=mean(d$y))
>>> d.y.rnd = sci.poisson.rvs(d.y.mean(), size=100)
# fit1 <- glm(y.rnd ~ 1, data=d, family=poisson)
>>> model1 = smf.glm('y.rnd ~ 1', data=d, family=sm.families.Poisson())
>>> fit1 = model1.fit()
# fit2 <- glm(y.rnd ~ x, data=d, family=poisson)
>>> model2 = smf.glm('y.rnd ~ x', data=d, family=sm.families.Poisson())
>>> fit2 = model2.fit()
# fit1$deviance - fit2$deviance
>>> fit1.deviance - fit2.deviance
0.63270346107955788

The difference in deviance is 1.92 for data whose mean value is a constant Poisson random number.

The x model with meaningless explanatory variables fits better than the constant model, which is the "true model"

By repeating this step about 1000 times, the ** distribution of test statistic ** and the “distribution of deviance difference $ \ Delta D_ {1, 2} $” in this example can be predicted.

def get_dd(d):
    sample_d = len(d)
    mean_y = d.y.mean()
    d.y.rnd = sci.poisson.rvs(mu=mean_y, size=sample_d)
    model1 = smf.glm('y.rnd ~ 1', data=d, family=sm.families.Poisson())
    model2 = smf.glm('y.rnd ~ x', data=d, family=sm.families.Poisson())
    fit1 = model1.fit()
    fit2 = model2.fit()
    return fit1.deviance - fit2.deviance


def pb(d, bootstrap=1000):
    return pandas.Series([get_dd(d) for _ in xrange(bootstrap)])

results = pb(d)
results.describe()

results[results >= 4.5].count()


results[results>= 4.5].count()
# 32

results.quantile(.95)
# 3.6070094508948025

results.hist()

From the above results, the "probability that the difference in deviance is greater than 4.5" is $ 32/1000 $, that is, $ P = 0.032 $. Also, if you find the difference in deviance $ \ Delta D_ {1,2} $ where $ P = 0.05 $. It becomes $ \ Delta D_ {1,2} \ leq 3.61 $.

From the above, There is a significant difference because "the P value of the deviance difference of 4.5 was 0.032, which is smaller than the significance level of 0.05". It is judged that "the null hypothesis (constant model) is rejected and the x model remains, so this is adopted".

Approximate calculation method using chi-square distribution

The probability distribution of the difference in deviance $ \ Delta D_ {1,2} $ can be approximated by the $ \ chi ^ 2 $ distribution with $ k_2 --k_1 = 2-1 = 1 $ degrees of freedom.

Apparently there is no ANOVA for GLM in stats models ...

The $ \ chi ^ 2 $ distribution approximation is ** an approximation calculation that is effective when the sample size is large **.

The PB method seems to be better for small samples with a small number of individuals investigated, or for data variations that are not Poisson distributions but homoscedastic normal distributions.

Summary

Both the likelihood ratio test and the model selection by AIC focus on the statistic of deviance.

--AIC: The purpose is to select a "model that makes good predictions" --Likelihood ratio test: The purpose is to safely reject the null hypothesis

It seems that there is no choice but to choose according to the purpose.

[PYTHON] Introduction to Statistical Modeling for Data Analysis GLM Likelihood-Ratio Test and Test Asymmetry