[PYTHON] [Hands-on for beginners] Read kaggle's "Forecasting Home Prices" line by line (8th: Building a Forecast Model)
theme
The 8th project to make a note of the contents of hands-on, where everyone will challenge the famous "House Price" problem of kaggle. It's more of a memo than a commentary, but I hope it helps someone somewhere. The preparations were completed last time, and I am in a state of repeating reflections, saying that I should have studied more statistics at the analysis stage.
- Original theme: https://www.kaggle.com/c/house-prices-advanced-regression-techniques
- Referenced article: https://yolo-kiyoshi.com/2018/12/17/post-1003/
Today's work
Building a predictive model
StandardScaler () #scaling
For the time being, read the following article. And today I will write my impressions.
- About scale conversion: https://aizine.ai/preprocessing0614/
What I found:
- Adjusts the possible values and ranges of the features and converts them so that they are within the same range.
- Example) If "Number of visitors" and "Temperature" are used as "Features that are effective for sales", the difference will be larger for "Number of visitors", which does not match the actual measurement. affect.
- There are several methods for scale conversion. Of the contents described in the article, the one I use this time is often used (?) "Standard Scaler conversion"
Lasso regression model
I will read the following articles for the time being, and write my impressions today (second time).
- Lasso regression model: https://aizine.ai/ridge-lasso-elasticnet/
- Related word "Linear regression": https://aizine.ai/glossary-liner-regression/
- Related word "overfitting": https://aizine.ai/overfitting0206/
- Related word "Cost function / Mean squared error": https://aizine.ai/glossary-mse/
What I found:
- This time, because it is property information, there are many explanatory variables, so Lasso regression is adopted.
Pipeline construction
- Read the following article for the time being. And today I will write my impressions. (3rd time) *
- About the pipeline: https://qiita.com/colako/items/b4f4159b77c0a87e978f
What I found:
- It will create various flows such as scaling and class generation. (This is the sentence that I want to be completely understood ...)
That's it.
Today, I started to read without writing any code, but it is important to know the background, or if I do it systematically after doing this hands-on, I feel that it will be true.