Predictive Statistics (Practice Simple Regression) Python

I am doing it as an output of study, so I may make a mistake. Feel free to comment.

This time, I wrote "Predictive Statistics (Theory)" in the previous article, but I think that there are some things that can not be understood from theory alone, so actually python I will practice using the library of. This time, we will use a method called regression </ b> in the prediction.

Contents

・ Modeling flow ・ Let's do simple regression analysis

Modeling flow

(1) Determine the explanatory variable (x-axis) and objective variable (y-axis) in advance. ② Read the data ③ Prepare variables to store the model ④ Model creation ⑤ Predict from the model

Let's do a simple regression analysis

It's a simple analysis, but the actual code flow looks like this:

import pandas as pd
import numpy as np
from matplotlib import pyplot
%matplotlib inline #Current browser display
from sklearn.linear_model import LinearRegression as LR #Class for linear regression

train = pd.read_csv("train.csv") #Phase (2), reading learning data
test  = pd.read_csv("test.csv") #Phase (2), reading evaluation data
sample = pd.read_csv("sample.csv") #Phase (2) Read the sample to write the evaluation data

trainX = train["Column name"] #Setting explanatory variables
testX = test["Column name"] #Explanatory variable
y = train["Column name"] #Setting the objective variable

trainX = trainX.values.reshape(-1,1) #In the case of simple regression, it is necessary to prepare the data
testX  = testX.values.reshape(-1,1) 

model = LR() #Prepare variables for regression model ③ Phase

model.fit(trainX, y) #Create a simple regression model ④ phase

pred = model.predict(testX) #Substitute the prediction result for testX and give the value of y for each column name

sample["Column name you want to assign"] = pred #Insert prediction into sample file

sample.to_csv("file name.csv") #Write out the data file

* If the column names of train and test are not the same, it is meaningless. </ b>

Recommended Posts