Well, Python doesn't have stepAIC.
If you think about it, you can find Article like this on StackOverflow.
Refer to the link provided by the respondents (Forward Selection with stats models; but only linear regression is supported, and the index is the coefficient of determination, not AIC). I decided to write stepAIC.
step_aic
def step_aic(model, exog, endog, **kwargs):
    """
    This select the best exogenous variables with AIC
    Both exog and endog values can be either str or list.
    (Endog list is for the Binomial family.)
    Note: This adopt only "forward" selection
    Args:
        model: model from statsmodels.formula.api
        exog (str or list): exogenous variables
        endog (str or list): endogenous variables
        kwargs: extra keyword argments for model (e.g., data, family)
    Returns:
        model: a model that seems to have the smallest AIC
    """
    # exog,Forcibly convert endog to list format
    exog = np.r_[[exog]].flatten()
    endog = np.r_[[endog]].flatten()
    remaining = set(exog)
    selected = []  #Factors that confirmed the adoption
    #Calculate AIC with constant term only
    formula_head = ' + '.join(endog) + ' ~ '
    formula = formula_head + '1'
    aic = model(formula=formula, **kwargs).fit().aic
    print('AIC: {}, formula: {}'.format(round(aic, 3), formula))
    current_score, best_new_score = np.ones(2) * aic
    #If all factors are adopted or if AIC does not increase no matter which factor is added, the process ends.
    while remaining and current_score == best_new_score:
        scores_with_candidates = []
        for candidate in remaining:
            #Calculate AIC when adding the remaining factors one by one
            formula_tail = ' + '.join(selected + [candidate])
            formula = formula_head + formula_tail
            aic = model(formula=formula, **kwargs).fit().aic
            print('AIC: {}, formula: {}'.format(round(aic, 3), formula))
            scores_with_candidates.append((aic, candidate))
        #The factor with the smallest AIC is the best_Candidate
        scores_with_candidates.sort()
        scores_with_candidates.reverse()
        best_new_score, best_candidate = scores_with_candidates.pop()
        #If AIC decreases due to the addition of candidate factors, add it as a deterministic factor.
        if best_new_score < current_score:
            remaining.remove(best_candidate)
            selected.append(best_candidate)
            current_score = best_new_score
    formula = formula_head + ' + '.join(selected)
    print('The best formula: {}'.format(formula))
    return model(formula, **kwargs).fit()
If the explanatory variables are x and f, it looks like this. 
(You can use'y'instead of ['y'])
Is it right ... is it right ...?
The answers were exactly the same as the binomial distribution and logistic regression chapters of Midoribon (Introduction to Statistical Modeling for Data Analysis).
But if you make a mistake, please let me know.
Recommended Posts