[PYTHON] [Hands-on for beginners] Read kaggle's "Forecasting Home Prices" line by line (Part 1: Reading data)

theme

There was a story about the job of operating a real estate system as a service, and that there is no loss in doing hands-on at the field level. That's why we decided to challenge the famous "House Price" problem of kaggle together. And I decided to post the contents that I read line by line to qiita because it will probably be useful later if I write it down properly. It's more of a memo than a commentary, but I hope it helps someone somewhere.

Today's work

Library preparation

I will explain each library one by one when I used it in my work, so I copied this as a spell once.

import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline
from sklearn.linear_model import (
    LinearRegression,
    Ridge,
    Lasso
)
%matplotlib inline

Data capture

The actual work starts from here. First, read and format the CSV file to be used. For the time being, copy this. Explain one by one.

#Data reading
train = pd.read_csv('train.csv') #Training data
test = pd.read_csv('test.csv') #test data
#Merge training data and test data
train['WhatIsData'] = 'Train'
test['WhatIsData'] = 'Test'
test['SalePrice'] = 9999999999
alldata = pd.concat([train,test],axis=0).reset_index(drop=True)
print('The size of train is : ' + str(train.shape))
print('The size of test is : ' + str(test.shape))

Read CSV file

Data is uniformly stored in the train variable column

Gatchan test data and train data

Display the summary of the captured data

That's it.

That's all for today. I'll use it one hour a week to put it together, so it's a turtle-like speed, but thank you for your patronage.

Recommended Posts

[Hands-on for beginners] Read kaggle's "Forecasting Home Prices" line by line (Part 1: Reading data)
[Hands-on for beginners] Read kaggle's "Predicting Home Prices" line by line (Part 5: Dummy categorical variables)
[Hands-on for beginners] Read kaggle's "Predicting Home Prices" line by line (Part 2: Checking Missing Values)
[Hands-on for beginners] Read kaggle's "Predicting Home Prices" line by line (Part 3: Preparation for missing value complementation)
[Hands-on for beginners] Read kaggle's "Forecasting Home Prices" line by line (8th: Building a Forecast Model)
[Hands-on for beginners] Read kaggle's "Forecasting Home Prices" line by line (4th: Complementing Missing Values (Complete))
[Hands-on for beginners] Read kaggle's "Predicting Home Prices" line by line (7th: Preparing to build a prediction model)
[Hands-on for beginners] Read kaggle's "Predicting House Prices" line by line (6th: Distribution conversion of objective variables)
Data Science 100 Knock ~ Battle for less than beginners part6
Data science 100 knocks ~ Battle for less than beginners part5
Data Science 100 Knock ~ Battle for less than beginners part2
Data Science 100 Knock ~ Battle for less than beginners part1
Data Science 100 Knock ~ Battle for less than beginners part9
Data Science 100 Knock ~ Battle for less than beginners part7
Data Science 100 Knock ~ Battle for less than beginners part4
Data science 100 knocks ~ Battle for less than beginners part8
Data Science 100 Knock ~ Battle for less than beginners part11
[For beginners] Basics of Python explained by Java Gold Part 2
How to implement 100 data science knocks for data science beginners (for windows10 Home)
[For beginners] Basics of Python explained by Java Gold Part 1