Introduction to Effectiveness Verification Chapter 2 Written in Python

Introduction

Introduction to Effectiveness Verification-Causal Reasoning for Correct Comparison / Basics of Econometrics Reproduce the source code in Python To do.

I already have a Great ancestor implementation example, but I will leave it as a memo for my study.

This article covers Chapter 2. The code is also posted on github. In addition, variable names and processing contents are basically implemented in the book.

regression analysis

You can implement linear regression with scikit-learn or stats models. However, regarding the statistics of each variable, statsmodels is more advantageous, so this is more convenient.

scikit-learn

Regression analysis of sklearn


from sklearn.linear_model import LinearRegression

#Model learning
X = biased_data[['treatment', 'history']]
y = biased_data['spend']
model = LinearRegression(fit_intercept=True, normalize=False).fit(X, y)

#Result output
print(f'R^2: {model.score(X, y)}')
print(f'intercept: {model.intercept_}')
print(f'coefficients: {model.coef_}')

The result of scikit-learn is less than that of stats models described later. It is possible to calculate each statistical value based on the training data, but it seems that there is no merit to use scikit-learn up to that point.

statsmodels

There seem to be multiple ways to learn models using statsmodels, but the R-like ones are as follows.

Regression analysis of stats models


from statsmodels.formula.api import ols

#Model learning
model = ols('spend ~ treatment + history', data=biased_data).fit()

#Result output
model.summary()

By the way, if the output of the result is like model.summary (). Tables [1], you can specify any table from multiple information.

Also, if you want to get the estimated value, you can refer to the list in dictionary format using model.params.

Read RData

To read R format data, use rdata. For details on how to use the module, refer to here.

Read RData


import rdata

parsed = rdata.parser.parse_file('./vouchers.rda')
converted = rdata.conversion.convert(parsed)
vouchers = converted['vouchers']

Regression analysis collectively

In this book, we are learning the models of multiple objective variables at once, but it seems difficult to do the same in Python.

Regression analysis collectively


import pandas as pd
from statsmodels.formula.api import ols

#Definition of regression equation
formula_x_base = ['VOUCH0']
formula_x_covariate = [
    'SVY', 'HSVISIT', 'AGE', 'STRATA1', 'STRATA2', 'STRATA3', 'STRATA4', 'STRATA5', 'STRATA6', 'STRATAMS',
    'D1993', 'D1995', 'D1997', 'DMONTH1', 'DMONTH2', 'DMONTH3', 'DMONTH4', 'DMONTH5', 'DMONTH6',
    'DMONTH7', 'DMONTH8', 'DMONTH9', 'DMONTH10', 'DMONTH11', 'DMONTH12', 'SEX2',
]
formula_ys = [
    "TOTSCYRS","INSCHL","PRSCH_C","USNGSCH","PRSCHA_1","FINISH6","FINISH7","FINISH8","REPT6",
    "REPT","NREPT","MARRIED","HASCHILD","HOURSUM","WORKING3",
]

#Definition of the function that receives the regression result
def get_regression_result(formula, data):
  model = ols(formula, data=data).fit()
  result = pd.read_html(model.summary().tables[1].as_html(), header=0)[0]
  result.columns = ['term', 'estimate', 'std.err', 'statistic', 'p.value', '0.025', '0.975']
  result
  return result

#Perform regression analysis together
results = list()
for formula_y in formula_ys:
  base_reg_formula = f'{formula_y} ~ {" + ".join(formula_x_base)}'
  base_reg_model_index = f'{formula_y}_base'
  covariate_reg_formula = f'{formula_y} ~ {" + ".join(formula_x_base+formula_x_covariate)}'
  covariate_reg_model_index = f'{formula_y}_covariate'

  base_reg_result = get_regression_result(base_reg_formula, regression_data)
  base_reg_result['model_index'] = base_reg_model_index
  results.append(base_reg_result)
  
  covariate_reg_result = get_regression_result(covariate_reg_formula, regression_data)
  covariate_reg_result['model_index'] = covariate_reg_model_index
  results.append(covariate_reg_result)

df_results = pd.concat(results).reset_index(drop=True)
df_results = df_results[['model_index', 'term', 'estimate', 'std.err', 'statistic', 'p.value', '0.025', '0.975']]

Plot of relationship between intervention and objective variable

Probably easy to plot with error bars in matplotlib. The point is to find the magnitude of the error in the error bar used to express the confidence interval from the difference between the estimated value and the confidence interval.

In the data used for drawing (not shown here), the value is obtained from the model, but there may be some deviation because there are few significant figures at that time.

plot


import matplotlib.pyplot as plt

estimate = going_private_results['estimate']
estimate_error = going_private_results['estimate'] - going_private_results['0.025']  #Let the difference between the confidence interval and the estimated value be the length of the error bar

xmin = 0
xmax = going_private_results.shape[0] - 1

plt.errorbar(range(xmax+1), estimate, estimate_error, fmt='o')
plt.hlines(y=0, xmin=xmin, xmax=xmax, colors='k', linestyles='dashed')
plt.xlabel('model_indexe')
plt.ylabel('estimate')
plt.xticks(range(going_private_results.shape[0]), going_private_results['model_index'], rotation=45)
plt.show()

Relation

-I wrote "Introduction to Effect Verification" in Python -Read RData format dataset in Python

Recommended Posts

Introduction to effectiveness verification Chapter 3 written in Python
Introduction to Effectiveness Verification Chapter 2 Written in Python
Introduction to Effectiveness Verification Chapter 1 in Python
Introduction to Effectiveness Verification Chapters 4 and 5 are written in Python
I wrote "Introduction to Effect Verification" in Python
[Introduction to Python3 Day 13] Chapter 7 Strings (7.1-7.1.1.1)
[Introduction to Python3 Day 14] Chapter 7 Strings (7.1.1.1 to 7.1.1.4)
[Introduction to Python3 Day 15] Chapter 7 Strings (7.1.2-7.1.2.2)
[Introduction to Python3 Day 21] Chapter 10 System (10.1 to 10.5)
"Introduction to effect verification Chapter 3 Analysis using propensity score" + α is tried in Python
Fourier series verification code written in Python
[Introduction to Python] How to use class in Python?
[Introduction to Python3, Day 17] Chapter 8 Data Destinations (8.1-8.2.5)
[Introduction to Python3, Day 17] Chapter 8 Data Destinations (8.3-8.3.6.1)
[Introduction to Python3 Day 19] Chapter 8 Data Destinations (8.4-8.5)
[Introduction to Python3 Day 18] Chapter 8 Data Destinations (8.3.6.2 to 8.3.6.3)
Introduction to Vectors: Linear Algebra in Python <1>
[Introduction to Python3 Day 12] Chapter 6 Objects and Classes (6.3-6.15)
tse --Introduction to Text Stream Editor in Python
Introduction to Python language
[Introduction to Python3 Day 22] Chapter 11 Concurrency and Networking (11.1 to 11.3)
Introduction to OpenCV (python)-(2)
[Introduction to Python3 Day 23] Chapter 12 Become a Paisonista (12.1 to 12.6)
[Introduction to Python3 Day 20] Chapter 9 Unraveling the Web (9.1-9.4)
[Introduction to Python3 Day 8] Chapter 4 Py Skin: Code Structure (4.1-4.13)
Parse a JSON string written to a file in Python
[Chapter 5] Introduction to Python with 100 knocks of language processing
[Chapter 3] Introduction to Python with 100 knocks of language processing
[Technical book] Introduction to data analysis using Python -1 Chapter Introduction-
[Chapter 4] Introduction to Python with 100 knocks of language processing
Introduction to Python Django (2) Win
To flush stdout in Python
Login to website in Python
Introduction to serial communication [Python]
[Introduction to Python] <list> [edit: 2020/02/22]
Introduction to Python (Python version APG4b)
Gacha written in Python -BOX gacha-
An introduction to Python Programming
How to develop in Python
Introduction to Python For, While
Post to Slack in Python
[Introduction to Udemy Python 3 + Application] 36. How to use In and Not
[Introduction to Python3 Day 3] Chapter 2 Py components: Numbers, strings, variables (2.2-2.3.6)
[Introduction to Python3 Day 4] Chapter 2 Py Components: Numbers, Strings, Variables (2.3.7-2.4)
Introduction
[Introduction to Python3 Day 21] Chapter 10 System (10.1 to 10.5)
real-time-Personal-estimation (learning)
kivy introduction
Squid Lisp written in Python: Hy
[Introduction to Udemy Python 3 + Application] 58. Lambda
[Introduction to Udemy Python 3 + Application] 31. Comments
[Python] How to do PCA in Python
Practice! !! Introduction to Python (Type Hints)
[Introduction to Python3 Day 1] Programming and Python
Convert markdown to PDF in Python
How to collect images in Python
100 Language Processing Knock Chapter 1 in Python
[Introduction to Python] <numpy ndarray> [edit: 2020/02/22]
[Introduction to Udemy Python 3 + Application] 57. Decorator
Introduction to Python Hands On Part 1
How to use SQLite in Python
[Introduction to Python] How to parse JSON
[Introduction to Udemy Python 3 + Application] 56. Closure
In the python command python points to python3.8