StatsModels

It is a package that can use various statistical models such as linear regression, logistic regression, generalized linear model, ARIMA model, and calculation of autocorrelation function.

API list https://www.statsmodels.org/stable/api.html

install

Enter with pip https://www.statsmodels.org/stable/install.html

`terminal`


pip install statsmodels

2. Linear regression

You can create a linear regression model with ordinary least squares with statsmodels.api.OLS (). If x is composed only of variables, there is no y-intercept, and if a constant sequence is added to x with statsmodels.api.add_constant (), the regression is with y-intercept.

`python`


import numpy as np

import statsmodels.api as sm

spector_data = sm.datasets.spector.load(as_pandas=False)
x = spector_data.exog
xc = sm.add_constant(x, prepend=False)
y = spector_data.endog
print(xc.shape, y.shape)

# Fit and summarize OLS model
model = sm.OLS(y, xc)
res = model.fit()

print(res.summary())

You can also retrieve each value

`python`


>>> res.params  #coefficient
array([ 0.46385168,  0.01049512,  0.37855479, -1.49801712])

>>> res.pvalues  #P value
array([0.00784052, 0.59436148, 0.01108768, 0.00792932])

>>> res.aic, res.bic  #Akaike Information Criterion, Bayesian Information Criterion
(33.95649234217083, 39.81943595336974)

>>> res.bse  #Standard error
array([0.16195635, 0.01948285, 0.13917274, 0.52388862])

>>> res.resid  #Residual error
array([ 0.05426921, -0.07340692, -0.27529932,  0.01762875,  0.42221284,
       -0.00701576,  0.03936941, -0.05363477, -0.16983152,  0.37535999,
        0.06818476, -0.28335827, -0.39932119,  0.72348259, -0.41225249,
        0.0276562 , -0.03995305, -0.01409045, -0.56914272,  0.39131297,
       -0.06696482,  0.14645583, -0.36800073, -0.78153024,  0.22554445,
        0.52339378,  0.36858806, -0.37090458,  0.20600614,  0.0226678 ,
       -0.53887544,  0.8114495 ])

Estimate is predict ()

`python`


result.predict(xc)

`result`


array([-0.05426921,  0.07340692,  0.27529932, -0.01762875,  0.57778716,
        0.00701576, -0.03936941,  0.05363477,  0.16983152,  0.62464001,
       -0.06818476,  0.28335827,  0.39932119,  0.27651741,  0.41225249,
       -0.0276562 ,  0.03995305,  0.01409045,  0.56914272,  0.60868703,
        0.06696482,  0.85354417,  0.36800073,  0.78153024,  0.77445555,
        0.47660622,  0.63141194,  0.37090458,  0.79399386,  0.9773322 ,
        0.53887544,  0.1885505 ])

3. Logistic regression

`python`


import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm

# Load the data from Spector and Mazzeo (1980)
spector_data = sm.datasets.spector.load()
spector_data.exog = sm.add_constant(spector_data.exog)

y = spector_data.endog
x = spector_data.exog

# Follow statsmodles ipython notebook
model = sm.Logit(y, x)
res = model.fit(disp=0)

print(res.summary())

You can get various values as well.

`python`


>>> res.params
array([-13.02134686,   2.82611259,   0.09515766,   2.37868766])

>>> res.pvalues
array([0.00827746, 0.02523911, 0.50143424, 0.0254552 ])

>>> res.aic, res.bic
(33.779268444262826, 39.642212055461734)

>>> res.bse
array([4.93132421, 1.26294108, 0.14155421, 1.06456425])

>>> res.resid_dev
array([-0.23211021, -0.35027122, -0.64396264, -0.22909819,  1.06047795,
       -0.26638437, -0.23178275, -0.32537884, -0.48538752,  0.85555565,
       -0.22259715, -0.64918082, -0.88199929,  1.81326864, -0.94639849,
       -0.24758297, -0.3320177 , -0.28054444, -1.33513084,  0.91030269,
       -0.35592175,  0.44718924, -0.74400503, -1.95507406,  0.59395382,
        1.20963752,  0.95233204, -0.85678568,  0.58707192,  0.33529199,
       -1.22731092,  2.09663887])

`python`


>>> res.predict(x)
array([0.02657799, 0.05950125, 0.18725993, 0.02590164, 0.56989295,
       0.03485827, 0.02650406, 0.051559  , 0.11112666, 0.69351131,
       0.02447037, 0.18999744, 0.32223955, 0.19321116, 0.36098992,
       0.03018375, 0.05362641, 0.03858834, 0.58987249, 0.66078584,
       0.06137585, 0.90484727, 0.24177245, 0.85209089, 0.83829051,
       0.48113304, 0.63542059, 0.30721866, 0.84170413, 0.94534025,
       0.5291172 , 0.11103084])

4. Generalized linear model

Select the distribution and link function from the following combinations Also, the details about the distribution and the link function are summarized below. https://www.statsmodels.org/stable/glm.html#families

The family = sm.families.Gamma () part of sm.GLM () is the part that specifies the distribution and link function. In the following, the default inverse is used because the link function is not specified in the gamma distribution, but when using log, it should be sm.families.Gaussian (sm.families.links.log).

`python`


import statsmodels.api as sm
data = sm.datasets.scotland.load(as_pandas=False)
x = sm.add_constant(data.exog)
y = data.endog

model = sm.GLM(y, x, family=sm.families.Gamma())
res = model.fit()
res.summary()

`python`


>>> res.params
[-1.77652703e-02  4.96176830e-05  2.03442259e-03 -7.18142874e-05
  1.11852013e-04 -1.46751504e-07 -5.18683112e-04 -2.42717498e-06]

>>> res.scale
0.003584283173493321

>>> res.deviance
0.08738851641699877

>>> res.pearson_chi2
0.08602279616383915

>>> res.llf
-83.01720216107174

`python`


>>> res.predict(x)
array([57.80431482, 53.2733447 , 50.56347993, 58.33003783, 70.46562169,
       56.88801284, 66.81878401, 66.03410393, 57.92937473, 63.23216907,
       53.9914785 , 61.28993391, 64.81036393, 63.47546816, 60.69696114,
       74.83508176, 56.56991106, 72.01804172, 64.35676519, 52.02445881,
       64.24933079, 71.15070332, 45.73479688, 54.93318588, 66.98031261,
       52.02479973, 56.18413736, 58.12267471, 67.37947398, 60.49162862,
       73.82609217, 69.61515621])

Notes on how to use StatsModels that can use linear regression and GLM in python

terminal

2. Linear regression

python

python

python

result

3. Logistic regression

python

python

python

4. Generalized linear model

python

python

python

`terminal`

`python`

`python`

`python`

`result`

`python`

`python`

`python`

`python`

`python`

`python`