StepAIC in Python


Well, Python doesn't have stepAIC.

If you think about it, you can find Article like this on StackOverflow.

Refer to the link provided by the respondents (Forward Selection with stats models; but only linear regression is supported, and the index is the coefficient of determination, not AIC). I decided to write stepAIC.


def step_aic(model, exog, endog, **kwargs):
    This select the best exogenous variables with AIC
    Both exog and endog values can be either str or list.
    (Endog list is for the Binomial family.)

    Note: This adopt only "forward" selection

        model: model from statsmodels.formula.api
        exog (str or list): exogenous variables
        endog (str or list): endogenous variables
        kwargs: extra keyword argments for model (e.g., data, family)

        model: a model that seems to have the smallest AIC

    # exog,Forcibly convert endog to list format
    exog = np.r_[[exog]].flatten()
    endog = np.r_[[endog]].flatten()
    remaining = set(exog)
    selected = []  #Factors that confirmed the adoption

    #Calculate AIC with constant term only
    formula_head = ' + '.join(endog) + ' ~ '
    formula = formula_head + '1'
    aic = model(formula=formula, **kwargs).fit().aic
    print('AIC: {}, formula: {}'.format(round(aic, 3), formula))

    current_score, best_new_score = np.ones(2) * aic

    #If all factors are adopted or if AIC does not increase no matter which factor is added, the process ends.
    while remaining and current_score == best_new_score:
        scores_with_candidates = []
        for candidate in remaining:

            #Calculate AIC when adding the remaining factors one by one
            formula_tail = ' + '.join(selected + [candidate])
            formula = formula_head + formula_tail
            aic = model(formula=formula, **kwargs).fit().aic
            print('AIC: {}, formula: {}'.format(round(aic, 3), formula))

            scores_with_candidates.append((aic, candidate))

        #The factor with the smallest AIC is the best_Candidate
        best_new_score, best_candidate = scores_with_candidates.pop()

        #If AIC decreases due to the addition of candidate factors, add it as a deterministic factor.
        if best_new_score < current_score:
            current_score = best_new_score

    formula = formula_head + ' + '.join(selected)
    print('The best formula: {}'.format(formula))
    return model(formula, **kwargs).fit()

How to use

If the explanatory variables are x and f, it looks like this.
(You can use'y'instead of ['y'])



Is it right ... is it right ...?

The answers were exactly the same as the binomial distribution and logistic regression chapters of Midoribon (Introduction to Statistical Modeling for Data Analysis).

But if you make a mistake, please let me know.

Recommended Posts

StepAIC in Python
Quadtree in Python --2
Python in optimization
Metaprogramming in Python
Python 3.3 in Anaconda
Geocoding in python
SendKeys in Python
Meta-analysis in Python
Unittest in python
Epoch in Python
Discord in Python
Sudoku in Python
nCr in python
N-Gram in Python
Programming in python
Plink in Python
Lifegame in Python.
FizzBuzz in Python
Sqlite in python
N-gram in python
LINE-Bot [0] in Python
Csv in python
Disassemble in Python
Reflection in Python
Constant in python
nCr in Python.
format in python
Scons in Python3
Puyo Puyo in python
python in virtualenv
PPAP in Python
Quad-tree in Python
Reflection in Python
Chemistry in Python
Hashable in python
DirectLiNGAM in Python
LiNGAM in Python
Flatten in python
flatten in python
Sorted list in Python
Daily AtCoder # 36 in Python
Daily AtCoder # 2 in Python
Implement Enigma in python
Daily AtCoder # 32 in Python
Daily AtCoder # 18 in Python
Singleton pattern in Python
File operations in Python
Key input in Python
Daily AtCoder # 33 in Python
Logistic distribution in Python
Daily AtCoder # 7 in Python
LU decomposition in Python
One liner in Python
Daily AtCoder # 24 in Python
case class in python
RNN implementation in python
Daily AtCoder # 8 in Python
File processing in Python
Elasticsearch Reindex in Python
Daily AtCoder # 42 in Python
Basic sorting in Python