I am doing it as an output of study, so I may make a mistake. Feel free to comment.
This time, I wrote "Predictive Statistics (Theory)" in the previous article, but I think that there are some things that can not be understood from theory alone, so actually python I will practice using the library of. This time, we will use a method called regression </ b> in the prediction.
・ Modeling flow ・ Let's do simple regression analysis
(1) Determine the explanatory variable (x-axis) and objective variable (y-axis) in advance. ② Read the data ③ Prepare variables to store the model ④ Model creation ⑤ Predict from the model
It's a simple analysis, but the actual code flow looks like this:
import pandas as pd
import numpy as np
from matplotlib import pyplot
%matplotlib inline #Current browser display
from sklearn.linear_model import LinearRegression as LR #Class for linear regression
train = pd.read_csv("train.csv") #Phase (2), reading learning data
test = pd.read_csv("test.csv") #Phase (2), reading evaluation data
sample = pd.read_csv("sample.csv") #Phase (2) Read the sample to write the evaluation data
trainX = train["Column name"] #Setting explanatory variables
testX = test["Column name"] #Explanatory variable
y = train["Column name"] #Setting the objective variable
trainX = trainX.values.reshape(-1,1) #In the case of simple regression, it is necessary to prepare the data
testX = testX.values.reshape(-1,1)
model = LR() #Prepare variables for regression model ③ Phase
model.fit(trainX, y) #Create a simple regression model ④ phase
pred = model.predict(testX) #Substitute the prediction result for testX and give the value of y for each column name
sample["Column name you want to assign"] = pred #Insert prediction into sample file
sample.to_csv("file name.csv") #Write out the data file
* If the column names of train and test are not the same, it is meaningless. </ b>