A generalized linear model is a general term for statistical models such as linear regression, Poisson regression, and logistic regression that explain the response variable (y) by the explanatory variable (x). More specifically, it is a statistical model determined by probability distributions, linear predictors, and link functions.
The probability distribution that the response variable follows.
The binomial distribution
and the Poisson distribution
are often used to handle discrete data such as count data.
The normal distribution
and the gamma distribution
are often used to handle continuous data that represents continuous quantities such as stock prices.
A model expression represented by a linear combination of explanatory variables. You can specify which explanatory variable to use and which interaction term (the term represented by the product of the explanatory variables) to use.
z = β_0 + β_{1}x_{1} + β_{2}x_{2}
A function that transforms an expression to correspond to a linear predictor. Thanks to the link function, the probability that the value can only take 0 to 1 can also correspond to the linear predictor. The link function to be used is determined to some extent depending on the distribution, so if you want to know more, please refer to the linked book in the reference below the article.
log(y) = β_0 + β_{1}x_{1} + β_{2}x_{2}
The generalized linear model can be easily executed using R's glm function
.
However,
I think there are many people who say that, so I will try it with Python.
When I looked for it, I found a module like statsmodels
, an Rglm function
.
$ pip install statsmodels
$ pip install patsy #After importing stats models, I was told that it is necessary, so install it
$ pip install pandas #Install for data processing
import statsmodels.api as sm
import pandas as pd
#Read the data in the reference URL below
data3a = pd.read_csv("http://hosho.ees.hokudai.ac.jp/~kubo/stat/iwanamibook/fig/poisson/data3a.csv")
#Create a linear predictor with variable x and constant term
data3a.x_c = sm.add_constant(data3a.x)
#Create a generalized linear model of Poisson distribution for distribution and logarithmic link function for link function
#Logarithmic link function is specified by default for Poisson distribution
model = sm.GLM(data3a.y, data3a.x_c, family=sm.families.Poisson())
result = model.fit()
#result
result.summary()
statsmodels
Benri!
It was a generalized linear model that extended statistical models such as linear models, but it is still difficult to incorporate real-life events into such a simple model. The book below also describes techniques such as the generalized linear mixed model, which is an evolution of the generalized linear model, so please refer to it.
http://hosho.ees.hokudai.ac.jp/~kubo/ce/IwanamiBook.html
http://statsmodels.sourceforge.net/devel/glm.html
Recommended Posts