Predictive statistics (practice edition multiple regression) python

Please feel free to comment as there may be some mistakes in writing as an output of study. This time, we will continue to Predictive Statistics (Practice Edition Simple Regression) Python to further improve the prediction accuracy.

Contents

・ What is multiple regression analysis? ・ What is a dummy variable? ・ Multiple regression modeling method

What is multiple regression analysis?

The only difference between multiple regression analysis and simple regression analysis is whether there are multiple or one explanatory variables. Multiple regression analysis improves prediction accuracy because there are more events (explanatory variables) that explain the objective variable. However, if there are too many explanatory variables, overfitting may occur and the prediction accuracy may decrease. Therefore, the selection of explanatory variables is very important in multiple regression.

What is a dummy variable?

A dummy variable is to represent something that cannot be represented by a numerical value such as the weather. This is a very important factor in multiple regression. Python provides a function that can be made into a dummy variable. You can convert it to a number by doing the following.

pd.get_dummies("Variables you want to make into dummy variables")

How to model multiple regression

Modeling in multiple regression is almost the same as simple regression. It just adds more explanatory variables to assign. The amount of code you write may also be less than simple regression. Let's actually look at the code. (Missing values have been corrected)

import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
#Display in current browser
%matplotlib inline 
from sklearn.linear_model import LinearRegression as LR

#previous data
train = pd.read_csv("train.csv")
#Forecast data
test = pd.read_csv("test.csv") 
#Data for entering predicted values
sample = pd.read_csv("sample.csv")

#Substitute explanatory variables
#Numerical data is excluded and executed.
trainX = pd.get_dummies(train[["Column name 1","Column name 2",...,"Column name n"]])
testX = pd.get_dummies(test[["Column name 1","Column name 2",...,"Column name n"]])

#Assignment of objective variable
y = train["Column name"]

#Create variables for regression model
model = LR()

#Creating a multiple regression model
model.fit(trainX,y)

Recommended Posts

Predictive statistics (practice edition multiple regression) python
Predictive Statistics (Practice Simple Regression) Python
Predictive Statistics (Practice Classification) Python
Multiple regression expressions in Python
Statistics with python
Predictive statistics (theory)
Beginners practice Python
2. Multivariate analysis spelled out in Python 2-1. Multiple regression analysis (scikit-learn)
First Python 3rd Edition
Feature Prediction Statistics python
Merge Nodes-Houdini Python Practice
Python: Supervised Learning (Regression)
Regression analysis in Python