[PYTHON] ■ Kaggle Practice for Beginners -House Sale Price (I tried using PyCaret)-by Google Colaboratory

  1. Introduction I'd like to show how to use PyCaret thru House Sale Price Competition to introduce how easy to use this library. This introduction is only to show very basic flow, so if you want to improve your score on Kaggle, you need to add some procedures, such as preprocessing steps and modeling techniques.

  2. Setting for Google Colaboratory

Setup thru Google Colaboratory environment using these command below to activate connection with Kaggle.

from google.colab import files
files.upload()
!pip install kaggle
!mkdir -p ~/.kaggle
!mv kaggle.json ~/.kaggle/
!chmod 600 /root/.kaggle/kaggle.json

Copy and paste the API linked to datasets you want to download from Kaggle. Don't forget putting "!" mark into front of this command.

# This is an example of House SalePrice
!kaggle competitions download -c house-prices-advanced-regression-techniques
  1. Install PyCaret
!pip install pycaret
  1. Import Dataset
import numpy as np 
import pandas as pd 

train = pd.read_csv("/content/train.csv")
test = pd.read_csv("/content/test.csv")
train.head() 
  1. Setup PyCaret
from pycaret.regression import *
reg = setup(train, target='SalePrice', session_id= 0)   

session_id: int, default = None If None, a random seed is generated and returned in the Information grid. The unique number is then distributed as a seed in all functions used during the experiment. This can be used for later reproducibility of the entire experiment.

  1. Compare Models

Compare Models compare_models(blacklist = None, fold = 10, round = 4, sort = ‘R2’, turbo = True) This function uses all models in the model library and scores them using K-fold Cross Validation. The output prints a score grid that shows MAE, MSE, RMSE, R2, RMSLE and MAPE by fold (default CV = 10 Folds) of all the available models in model library.

['tr'] Thielsen Regressor required long training time, so I removed the method this time. ['lar'] Least Angle Regression showed too large amount of MAE, so I omitted this time.

compare_models(blacklist=['tr', 'lar'])

image.png

6.Create Models

llar1 = create_model('llar', verbose=False)
  1. Predictions
predict_model(llar1)
predictions_llar1 = predict_model(llar1, data=test)
test_ID = test['Id']
predictions_llar_Label = predictions_llar1['Label']
my_submission = pd.DataFrame()
my_submission["Id"] = test_ID
my_submission["SalePrice"] = predictions_llar_Label
my_submission.to_csv('submission_llar.csv', index=False)
  1. Submission Copy and paste the API linked to submission page you want to submit to Kaggle. Don't forget putting "!" mark into front of this command.

# Submission to Kaggle
!kaggle competitions submit -c house-prices-advanced-regression-techniques -f submission_llar.csv -m "Message" 

References: LINKS

-Solve kaggle's House Prices -Predict House Prices ~ Challenge the Kaggle House Price Tutorial

Recommended Posts

■ Kaggle Practice for Beginners -House Sale Price (I tried using PyCaret)-by Google Colaboratory
■ Kaggle Practice for Beginners --Introduction of Python --by Google Colaboratory
[For beginners] I tried using the Tensorflow Object Detection API
[Kaggle] I tried undersampling using imbalanced-learn
I tried using GLM (generalized linear model) for stock price data
[Kaggle] I tried ensemble learning using LightGBM
First-principles calculations for free using Google Colaboratory
I tried using scrapy for the first time
vprof --I tried using the profiler for Python
I tried using the Google Cloud Vision API
I tried using firebase for Django's cache server
I tried simple image processing with Google Colaboratory.
I tried using google test and CMake in C
How to analyze with Google Colaboratory using Kaggle API
I tried to access Google Spread Sheets using Python
I tried using the frequently used seaborn method with as few arguments as possible [for beginners]
I tried using argparse
I tried using anytree
I tried using aiomysql
I tried using Summpy
I tried using coturn
I tried using "Anvil".
I tried using Hubot
I tried using ESPCN
I tried using openpyxl
I tried using Ipython
I tried using PyCaret
I tried using cron
I tried using ngrok
I tried using face_recognition
I tried using Jupyter
I tried using PyCaret
Kaggle ~ House Price Forecast ② ~
I tried using Heapq
I tried using doctest
I tried using folium
I tried using jinja2
I tried using folium
I tried using time-window
I tried using Tensorboard, a visualization tool for machine learning
[Python scraping] I tried google search top10 using Beautifulsoup & selenium
Miscellaneous notes that I tried using python for the matter
I tried the Google Cloud Vision API for the first time
[Python] The biggest weakness / disadvantage of Google Colaboratory [For beginners]