2. Multivariate analysis spelled out in Python 5-3. Logistic regression analysis (stats models)

Odds

Effective P No effect 1-P Odds P/(1-P)
Chemical A 0.2 0.8 0.250
Chemical B 0.05 0.95 0.053

Odds ratio

Relationship between odds ratio and regression coefficient in logistic regression

To consider **, let's assume a model that predicts the pass / fail of the test (1 if pass, 0 if fail) from the number of study hours. ** **

⑴ Import library

#Library used for numerical calculation
import numpy as np
import pandas as pd
import scipy as sp
from scipy import stats

#Library for drawing graphs
from matplotlib import pyplot as plt
%matplotlib inline
import seaborn as sns

#Library for estimating statistical models
import statsmodels.formula.api as smf
import statsmodels.api as sm

#Specifying the number of display digits
%precision 3

⑵ Data reading and confirmation

#Data acquisition
url = 'https://raw.githubusercontent.com/yumi-ito/sample_data/master/6-3-1-logistic-regression.csv'

#Data reading
df = pd.read_csv(url)

# #Output the first 5 lines of data
df.head() 

2_5_3_01.PNG

#Output basic statistics of data
df.describe().apply(lambda s: s.apply(lambda x: format(x, 'g')))

2_5_3_02.PNG

#Draw bar chart
sns.set()
sns.barplot(x = "hours", y = "result", data = df, palette='summer_r')

2_5_3_03.PNG

pass_rate = df.groupby("hours").mean()

2_5_3_04.PNG

(3) Model estimation and confirmation of results

#Estimate the model
mod_glm = smf.glm(formula = "result ~ hours",
                  data = df,
                  family = sm.families.Binomial()).fit()
#Output summary of estimation results
mod_glm.summary()

2_5_3_05.PNG

#Draw a regression curve
sns.lmplot(x = "hours", y = "result",
           data = df,
           logistic = True,
           scatter_kws = {"color": "green"},
           line_kws = {"color": "black"},
           x_jitter = 0.1, y_jitter = 0.02)

2_5_3_06.PNG

#Arithmetic progression with column name hours(0~9)Create a DataFrame for
predicted_value = pd.DataFrame({"hours": np.arange(0, 10, 1)})

#Calculate the predicted pass rate
pred = mod_glm.predict(predicted_value)

2_5_3_07.PNG

⑷ Find the log odds ratio and compare it with the coefficient

#Get 1 hour and 2 hour pass rates
pred_1 = pred[1]
pred_2 = pred[2]

#Calculate the odds for each
odds_1 = pred_1 / (1 - pred_1)
odds_2 = pred_2 / (1 - pred_2)

#Calculate log odds ratio
print("Log odds ratio:", round(sp.log(odds_2 / odds_1), 3))

#Calculate the coefficients of the model
value = mod_glm.params["hours"]
print("Model coefficients:", round(value, 3))

2_5_3_08.PNG

#Take the regression coefficient exp
exp = sp.exp(mod_glm.params["hours"])
print("Coefficient exp:", round(exp, 3))

#Calculate odds ratio
odds = odds_2 / odds_1
print("Odds ratio:", round(odds, 3))

2_5_3_09.PNG

Recommended Posts

2. Multivariate analysis spelled out in Python 5-3. Logistic regression analysis (stats models)
2. Multivariate analysis spelled out in Python 1-1. Simple regression analysis (scikit-learn)
2. Multivariate analysis spelled out in Python 2-1. Multiple regression analysis (scikit-learn)
2. Multivariate analysis spelled out in Python 2-3. Multiple regression analysis [COVID-19 infection rate]
2. Multivariate analysis spelled out in Python 6-1. Ridge regression / Lasso regression (scikit-learn) [multiple regression vs. ridge regression]
2. Multivariate analysis spelled out in Python 8-2. K-nearest neighbor method [Weighting method] [Regression model]
2. Multivariate analysis spelled out in Python 3-2. Principal component analysis (algorithm)
2. Multivariate analysis spelled out in Python 7-1. Decision tree (scikit-learn)
2. Multivariate analysis spelled out in Python 3-1. Principal component analysis (scikit-learn)
2. Multivariate analysis spelled out in Python 6-3. Ridge regression / Lasso regression (scikit-learn) [How regularization works]
2. Multivariate analysis spelled out in Python 8-1. K-nearest neighbor method (scikit-learn)
2. Multivariate analysis spelled out in Python 8-3. K-nearest neighbor method [cross-validation]
2. Multivariate analysis spelled out in Python 7-2. Decision tree [difference in division criteria]
Regression analysis in Python
Simple regression analysis in Python
First simple regression analysis in Python
Logistic regression analysis Self-made with python
[Logistic regression] Implement k-validation with stats models
I implemented Cousera's logistic regression in Python
[Logistic regression] Implement holdout verification with stats models
Logistic distribution in Python
Association analysis in Python
Multiple regression expressions in Python
What is Logistic Regression Analysis?
Axisymmetric stress analysis in Python
Japanese Natural Language Processing Using Python3 (4) Sentiment Analysis by Logistic Regression
[Python] PCA scratch in the example of "Introduction to multivariate analysis"
[Statistical test 2nd grade / quasi 1st grade] Regression analysis training in Python (2)
[Statistical test 2nd grade / quasi 1st grade] Regression analysis training in Python (1)
EEG analysis in Python: Python MNE tutorial
Planar skeleton analysis in Python (2) Hotfix
Simple regression analysis implementation in Keras
What is Multinomial Logistic Regression Analysis?
PRML Chapter 4 Bayesian Logistic Regression Python Implementation
Linear regression in Python (statmodels, scikit-learn, PyMC3)
Online Linear Regression in Python (Robust Estimate)
[Python] Saving learning results (models) in machine learning
Residual analysis in Python (Supplement: Cochrane rules)