[PYTHON] Kaggle Kernel Method Summary [Table Time Series Data]

In this article, I'll write something like a standard for each data type in the Kaggle competition. Also, I think it would be good if it could be a hint when accuracy does not come out regardless of the competition.

alt

This time we will touch on the following competition datasets. In the case of competition, there is no Kernel, so I will go with Notebooks together. Predict Future Sales Avocado Prices

1.Predict Future Sales

Type: Regression Summary: Predict the number of products sold in the next month from the store name, product name, etc.

1.1Time series Basics : Exploring traditional TS

Analysis performed result
Visualization of feature histogram Discovered that there is a bias in the category
Feature plots in chronological order You can see that there is a change of trends every season
ADF,KPSS,Stationarity with PP(Periodicity)check Periodicityの有無を知る
AR, MA,Try ARMA model etc.

In addition, we introduce how to approach in the Hierarchical time series, bottom-up, top-down, and middle-out.

For the time being, it is important for time-series data to reduce the ups and downs depending on the time into mathematical formulas.

Relation

Forecasting Hierarchical Time Series using R

Measurement time series analysis with R: AR, MA, ARMA, ARIMA model, prediction

1.2 Feature engineering, xgboost

Analysis performed result
Visualization of features I found the noise data, so I interpolated with other numerical values.
Feature engineering(Monthly average is calculated from daily sales, average of sales in a certain period is added as a feature amount, etc.) Improved accuracy
Predicted by xg boost

It was a form in which information as a time series was steadily incorporated into features by feature engineering and predicted by xgboost. This steady feature engineering is amazing ~

Relation

I wrote an article about feature engineering in the past, so please check it out. Features Engineering Memorandum

1.3 A beginner guide for sale data prediction

Analysis performed result
Visualization of features Discovered seasonal patterns and decided to reduce them to features
Predicted by LSTM

1.4 1st place solution - Part 1 - "Hands on Data"

Analysis performed result
Feature engineering(Grouping stores that haven't sold for months, discover duplicate store names, extract categories from names et) Improved accuracy

Summary

As you can see from here Data Visualization-> View Data Trends-> Feature Engineering-> Dive into Models

The routine is common.

2.Avocado Prices Next, I would like to introduce another one from the time series data set. From here, we will focus on the method used.

Type: Both regression and classification Summary: Historical avocado sales data

2.1 Price of Avocados || Pattern Recognition Analysis

This kernel has too beautiful & detailed visualization. Impressed.

Method used
Smoothing Moving Average
Seasonal Naive Method
Drift Method
ARIMA

2.2 Explore avocados from all sides! This is also very beautiful visualization. This is a classification problem of whether it is an organic avocado or not.

Method used
logistic regression
RandomForest
KNeighborsClassifier

2.3 Predicting prices of avocados

Tools used
prophet

2.4 EDA + Lasso This is also included in the classification problem.

Method used
DecisionTree
RandomForest
KNeighbours
SVM
AdaBoostClassifier
GradientBoostingClassifier
Xgboost
Lasso
Ridge
Bayesian Ridge
ElasticNet
HuberRegressor

Summary

Feature engineering and visualization are important! alt

Link

List of typical machine learning methods

Recommended Posts

Kaggle Kernel Method Summary [Table Time Series Data]
Kaggle Kernel Method Summary [Image]
[Python] Plot time series data
About time series data and overfitting
Differentiation of time series data (discrete)
Time series analysis 3 Preprocessing of time series data
<Pandas> How to handle time series data in a pivot table
Forecasting time series data with Simplex Projection
Predict time series data with neural network
July, a certain, M5 ~ Kaggle beginner time series data competition failure story ~
Time series data anomaly detection for beginners
[Kaggle] I tried feature engineering of multidimensional time series data using tsfresh.
How to handle time series data (implementation)
Reading OpenFOAM time series data and sets data
Get time series data from k-db.com in Python
Acquisition of time series data (daily) of stock prices
Smoothing of time series and waveform data 3 methods (smoothing)
View details of time series data with Remotte
How to read time series data in PyTorch
Kaggle Summary: Outbrain # 2
Time Series Decomposition
[Latest method] Visualization of time series data and extraction of frequent patterns using Pan-Matrix Profile
Kaggle Summary: Outbrain # 1
Implementation of clustering k-shape method for time series data [Unsupervised learning with python Chapter 13]
Kaggle related summary
Anomaly detection of time series data by LSTM (Keras)
[Python] Summary of table creation method using DataFrame (pandas)
Time series data prediction by AutoML (automatic machine learning)
Data batch extraction method by regular expression from Series