[PYTHON] Estimated Probit model by Binary Response model

What is the Binary Response model?

One purpose of econometrics is to properly identify the effects of a variable. Binary Response models, such as the probit model, are a valid approach to binary data with values of 0,1.

It is said that ordinary linear regression analysis is superior in the following points.

--The range of prediction falls within the interval [0,1]. --Dispersion is uniform.

This model is certainly superior in the above respects, but it causes serious problems when identifying the effects of the estimated parameters as in a linear regression analysis. This is because we are using a non-linear function to estimate this model. The mathematical theory is introduced below.

Identification of effects in linear regression analysis

Consider the following simple regression analysis model.

$\ Y= \beta_0 + \beta_1 X_1 + u $

From the definition of marginal effect, $ \ frac {\ partial Y} {\ partial X_1} = \ beta_1 $. Thus, in linear regression analysis, the estimated parameters themselves have a marginal effect.

Probit model

The probit model makes the following assumptions: --The error term follows a standard normal distribution. (By the way, this assumption can be removed by standardization)

$ P(Y_i=0|X)=\int^{X\beta}_{-\infty} dz \frac{1}{\sqrt{2\pi} } exp(\frac{z^2}{2})$

When the likelihood function is calculated based on this result, $ L(\beta) = \Pi^n_{i=1} (P(y_i=1|X))^{y_i}(P(y_i=0|X))^{1-y_i} $ It is difficult to solve this by hand, so ask the computer to perform the optimization calculation. In the following, the formula becomes complicated, so the cumulative distribution function of the standard normal distribution is $ \ Theta (X \ beta) $, and the probability density function is $ \ theta (X \ beta) $. The model we estimated is $ P(Y=1|X)=1 - \Theta(X\hat{\beta})$

It will be. A single derivative of this X gives:

$\frac{\partial P(Y=1|X)}{\partial X_i}=\hat{\beta_i}\times \theta(X\hat{\beta}) $

This is the marginal effect of the probit model. However, there is still one problem even if we do so far. That is, the marginal effect depends on X and is not uniquely determined. To solve this problem, we usually calculate as follows. $\frac{\partial P(Y=1|X)}{\partial X_i}=\frac{1}{n}\sum^n_{j=1} \hat{\beta_i}\times \theta(X_j\hat{\beta}) $

Only then can the effect of the probit model be identified.

Calculation by Python

import statsmodels.api as sm
import pandas as pd
#read data
data=pd.read_csv("___.csv")
target=data.loc[:,"name"]
explain=data.loc[:,["names"]]

#it is necessary to add constant variable
explain=sm.add_constant(explain)

model=sm.Probit(target, explain)
result=model.fit()
#you can get beta but not Partial effect
print(result.summary())

#you can get Partial Effect!!
print(result.get_margeff(at="overall").summary())
#if you change [at="overall"], the way of calculation can be change 

print(result.get_margeff(at="overall").summary()) The details of the command of https://www.statsmodels.org/stable/generated/statsmodels.discrete.discrete_model.ProbitResults.get_margeff.html It is in.

Recommended Posts

Estimated Probit model by Binary Response model
Beamformer response by python
Pokemon classification by topic model
Markov switching model by Python