StatsModels
It is a package that can use various statistical models such as linear regression, logistic regression, generalized linear model, ARIMA model, and calculation of autocorrelation function.
API list https://www.statsmodels.org/stable/api.html
Enter with pip https://www.statsmodels.org/stable/install.html
terminal
pip install statsmodels
You can create a linear regression model with ordinary least squares with statsmodels.api.OLS (). If x is composed only of variables, there is no y-intercept, and if a constant sequence is added to x with statsmodels.api.add_constant (), the regression is with y-intercept.
python
import numpy as np
import statsmodels.api as sm
spector_data = sm.datasets.spector.load(as_pandas=False)
x = spector_data.exog
xc = sm.add_constant(x, prepend=False)
y = spector_data.endog
print(xc.shape, y.shape)
# Fit and summarize OLS model
model = sm.OLS(y, xc)
res = model.fit()
print(res.summary())
You can also retrieve each value
python
>>> res.params #coefficient
array([ 0.46385168, 0.01049512, 0.37855479, -1.49801712])
>>> res.pvalues #P value
array([0.00784052, 0.59436148, 0.01108768, 0.00792932])
>>> res.aic, res.bic #Akaike Information Criterion, Bayesian Information Criterion
(33.95649234217083, 39.81943595336974)
>>> res.bse #Standard error
array([0.16195635, 0.01948285, 0.13917274, 0.52388862])
>>> res.resid #Residual error
array([ 0.05426921, -0.07340692, -0.27529932, 0.01762875, 0.42221284,
-0.00701576, 0.03936941, -0.05363477, -0.16983152, 0.37535999,
0.06818476, -0.28335827, -0.39932119, 0.72348259, -0.41225249,
0.0276562 , -0.03995305, -0.01409045, -0.56914272, 0.39131297,
-0.06696482, 0.14645583, -0.36800073, -0.78153024, 0.22554445,
0.52339378, 0.36858806, -0.37090458, 0.20600614, 0.0226678 ,
-0.53887544, 0.8114495 ])
Estimate is predict ()
python
result.predict(xc)
result
array([-0.05426921, 0.07340692, 0.27529932, -0.01762875, 0.57778716,
0.00701576, -0.03936941, 0.05363477, 0.16983152, 0.62464001,
-0.06818476, 0.28335827, 0.39932119, 0.27651741, 0.41225249,
-0.0276562 , 0.03995305, 0.01409045, 0.56914272, 0.60868703,
0.06696482, 0.85354417, 0.36800073, 0.78153024, 0.77445555,
0.47660622, 0.63141194, 0.37090458, 0.79399386, 0.9773322 ,
0.53887544, 0.1885505 ])
python
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm
# Load the data from Spector and Mazzeo (1980)
spector_data = sm.datasets.spector.load()
spector_data.exog = sm.add_constant(spector_data.exog)
y = spector_data.endog
x = spector_data.exog
# Follow statsmodles ipython notebook
model = sm.Logit(y, x)
res = model.fit(disp=0)
print(res.summary())
You can get various values as well.
python
>>> res.params
array([-13.02134686, 2.82611259, 0.09515766, 2.37868766])
>>> res.pvalues
array([0.00827746, 0.02523911, 0.50143424, 0.0254552 ])
>>> res.aic, res.bic
(33.779268444262826, 39.642212055461734)
>>> res.bse
array([4.93132421, 1.26294108, 0.14155421, 1.06456425])
>>> res.resid_dev
array([-0.23211021, -0.35027122, -0.64396264, -0.22909819, 1.06047795,
-0.26638437, -0.23178275, -0.32537884, -0.48538752, 0.85555565,
-0.22259715, -0.64918082, -0.88199929, 1.81326864, -0.94639849,
-0.24758297, -0.3320177 , -0.28054444, -1.33513084, 0.91030269,
-0.35592175, 0.44718924, -0.74400503, -1.95507406, 0.59395382,
1.20963752, 0.95233204, -0.85678568, 0.58707192, 0.33529199,
-1.22731092, 2.09663887])
python
>>> res.predict(x)
array([0.02657799, 0.05950125, 0.18725993, 0.02590164, 0.56989295,
0.03485827, 0.02650406, 0.051559 , 0.11112666, 0.69351131,
0.02447037, 0.18999744, 0.32223955, 0.19321116, 0.36098992,
0.03018375, 0.05362641, 0.03858834, 0.58987249, 0.66078584,
0.06137585, 0.90484727, 0.24177245, 0.85209089, 0.83829051,
0.48113304, 0.63542059, 0.30721866, 0.84170413, 0.94534025,
0.5291172 , 0.11103084])
Select the distribution and link function from the following combinations Also, the details about the distribution and the link function are summarized below. https://www.statsmodels.org/stable/glm.html#families
The family = sm.families.Gamma () part of sm.GLM () is the part that specifies the distribution and link function. In the following, the default inverse is used because the link function is not specified in the gamma distribution, but when using log, it should be sm.families.Gaussian (sm.families.links.log).
python
import statsmodels.api as sm
data = sm.datasets.scotland.load(as_pandas=False)
x = sm.add_constant(data.exog)
y = data.endog
model = sm.GLM(y, x, family=sm.families.Gamma())
res = model.fit()
res.summary()
python
>>> res.params
[-1.77652703e-02 4.96176830e-05 2.03442259e-03 -7.18142874e-05
1.11852013e-04 -1.46751504e-07 -5.18683112e-04 -2.42717498e-06]
>>> res.scale
0.003584283173493321
>>> res.deviance
0.08738851641699877
>>> res.pearson_chi2
0.08602279616383915
>>> res.llf
-83.01720216107174
python
>>> res.predict(x)
array([57.80431482, 53.2733447 , 50.56347993, 58.33003783, 70.46562169,
56.88801284, 66.81878401, 66.03410393, 57.92937473, 63.23216907,
53.9914785 , 61.28993391, 64.81036393, 63.47546816, 60.69696114,
74.83508176, 56.56991106, 72.01804172, 64.35676519, 52.02445881,
64.24933079, 71.15070332, 45.73479688, 54.93318588, 66.98031261,
52.02479973, 56.18413736, 58.12267471, 67.37947398, 60.49162862,
73.82609217, 69.61515621])
Recommended Posts