[PYTHON] Challenge Kaggle [House Prices]

Overview

I challenged a competition to predict the price of a house from features such as site area and age.

Library definition

import pandas as pd

Reading training data and test data

train = pd.read_csv('train.csv')
test = pd.read_csv('test.csv')

Division of training data and test data

Separate into the objective variable "Sale Price" and the other dependent variables.

train_x = train.drop(['Id', 'SalePrice'], axis=1)
train_y = train['SalePrice']
test_x = test.drop(['Id'], axis=1)

Replace categorical variables with numbers

for column in train_x.columns:
    labels, uniques = pd.factorize(train_x[column])
    train_x[column] = labels
for column in test_x.columns:
    labels, uniques = pd.factorize(test_x[column])
    test_x[column] = labels

Fitting with linear regression

from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(train_x, train_y)

Forecast

pred_y = regressor.predict(test_x)

CSV creation

submission = pd.DataFrame({'Id':test['Id'], 'SalePrice':pred_y})
submission.to_csv('submission.csv', index=False)

Recommended Posts

Challenge Kaggle [House Prices]
Kaggle House Prices ③ ~ Forecast / Submission ~
Kaggle House Prices ② ~ Model Creation ~
Kaggle House Prices ① ~ Feature Engineering ~
RECRUIT Challenge @ kaggle
Kaggle ~ House Price Forecast ② ~
Machine Learning Amateur Marketers Challenge Kaggle's House Prices (Part 1)
House Prices: Advanced Regression Techniques