[PYTHON] Time series analysis 3 Preprocessing of time series data

Aidemy 2020/10/29

Introduction

Hello, it is Yope! I am a liberal arts student, but I was interested in the possibilities of AI, so I went to the AI-specialized school "Aidemy" to study. I would like to share the knowledge gained here with you, and I am summarizing it on Qiita. I am very happy that many people have read the previous summary article. Thank you! This is the third post in time series analysis. Nice to meet you.

What to learn this time ・ Handle time series data with pandas ・ How to make time series data stationary

Handle time series data with pandas

Loading and displaying data

-Although the ultimate goal is to analyze time-series data with SARIMA, it is necessary to perform some preprocessing on the data passed at this time. -If time series data is given as a CSV file, reading will be done. Use __pd.read_csv ("file path") __.

Convert time information to index

-When analyzing time-series data, convert time information (Hour, Month, etc.) into an index to make it easier to handle. -The conversion procedure is as follows. (1) Define index information with __pd.date_range ("start", "end", freq = "interval") __. (2) Substitute the defined information in the index of the original data. ③ Delete the time information of the original data.

-At the start and end of the original data entered in ①, the interval can be confirmed with __df.head () __ and __df.tail () __. -As for the interval, if the data is composed in seconds, pass "S", minutes for "min", hours for "H", days for "D", and months for "M".

-Code![Screenshot 2020-10-29 14.00.33.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/698700/8846bc71-71c3-998c- 537b-cd65c3c57872.png)

・ Result![Screenshot 2020-10-29 14.00.19.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/698700/efb179c4-e69c-3c49- 6315-bde299bfbfc4.png)

Causes when data is not stationary

-The causes when time-series data does not have constantity include __ "trend" __ and "__ seasonal variation" __. -The expected value should be constant when there is stationarity, but if there is a __positive trend, it means that the expected value is on an upward trend __, so it cannot be said that there is stationarity. -Similarly, the autocorrelation coefficient should be constant when there is stationarity, but the autocorrelation coefficient is constant for data with seasonal fluctuations, that is, data in which the value suddenly increases or decreases only for a period of the year. It cannot be said that.

-In such a case, it is possible to obtain stationary data by performing __trend and conversion __ that removes seasonal fluctuations. -After creating a model with this steady-state data, the trend and seasonal fluctuations are combined again to build a model of the original series.

Make time series data stationary

Elimination of trends and seasonal fluctuations

-The following four methods can be mentioned to eliminate trends and seasonal fluctuations and to maintain stationarity. Details will be described later. ・ Uniform variance of fluctuation with __logarithmic transformation __ ・ Take __moving average __ to estimate the trend and remove it ・ Convert to __staff series __ (general) ・ Perform __seasonally adjusted __

Logarithmic transformation

-As seen in "Time Series Analysis 1", the change in data value can be moderated by performing __logarithmic conversion __. -By using this, the autocovariance can be made uniform for data with sudden changes in values such as seasonal fluctuations. That is, __seasonal fluctuations can be removed __. -However, __trend cannot be removed by this method __, so it is necessary to perform processing to remove the trend in addition to logarithmic conversion. -Logarithmic conversion can be done with __np.log (data) __.

・ Code![Screenshot 2020-10-29 14.03.41.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/698700/f3df871c-656e-8524- 0451-329add719538.png)

・ Result![Screenshot 2020-10-29 14.03.26.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/698700/447705c8-b9fd-4df5- 94c1-c04840f6dc45.png)

moving average

-__ Moving average __ is __ to take the average of a certain section while moving the section __. -The moving average allows the data to be smoothed while retaining the characteristics of the original data. This makes it possible to remove __seasonal fluctuations and extract trends __. -As an example, when monthly data has seasonal fluctuations, seasonal fluctuations can be removed by taking 12 moving averages. The extracted trend can also be removed by "(original series)-(moving average)".

・ The moving average can be calculated as follows. __Data .rolling (window = number of moving averages) .mean () __

-Code (CO2 concentration data, moving average every 51 weeks (1 year))Screenshot 2020-10-29 14.08.16.png -northeast-1.amazonaws.com/0/698700/3acdfc2c-6d10-a226-0f5c-740246fdfd67.png)

・ Result![Screenshot 2020-10-29 14.08.35.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/698700/b1662775-9436-06d6- 173b-78733ceaf6c2.png)

Floor difference series

-As seen in "Time Series Analysis 1", handling data by taking the difference from the previous value is called scale difference series. -Trends and seasonal fluctuations can be eliminated by using a difference series. It is the most common way to maintain stationarity because it is easy to do. -To find the difference series, you can find it with __data.diff () __. ・ The one that obtains the first-order difference series is called the primary difference series, and the one that obtains the difference series of the primary difference series is called the secondary difference series.

・ Code![Screenshot 2020-10-29 14.09.39.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/698700/bc547a8a-ee6f-aaf6- 01e1-53a367ecadf3.png)

・ Result![Screenshot 2020-10-29 14.09.30.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/698700/9f8ed0bc-d3a3-543f- b80f-8b0a7a17d56f.png)

Seasonally adjusted

-As seen in "Time Series Analysis 1", by making the series seasonally adjusted, trends and seasonal fluctuations can be removed from the original series. -As a mechanism of removal, the original series can be decomposed as "original series = trend + seasonal variation + residual", but if this is converted to "residual = original series- (trend + seasonal variation)" The residual is the original series minus trends and seasonal fluctuations. I am using this. -To make a seasonally adjusted series, perform __sm.tsa.seasonal_decompose (data, freq = interval specification) __.

-Code![Screenshot 2020-10-29 14.19.05.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/698700/d35cd55e-e896-dd35- fe13-c71155db3fd6.png)

・ Result![Screenshot 2020-10-29 14.19.13.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/698700/dc2f14cd-6f70-f0d2- 4092-60ba30444634.png)

Summary

-Before performing time series analysis, index the time information of the time series data. -Since the data to be passed to the time series model must have stationarity, it is necessary to perform preprocessing such as making a difference series for the data without stationarity.

This time is over. Thank you for reading until the end.

Recommended Posts

Time series analysis 3 Preprocessing of time series data
Python: Time Series Analysis: Preprocessing Time Series Data
Differentiation of time series data (discrete)
Time series analysis 4 Construction of SARIMA model
Python: Time Series Analysis
RNN_LSTM1 Time series analysis
Time series analysis 1 Basics
Preprocessing of prefecture data
Acquisition of time series data (daily) of stock prices
Smoothing of time series and waveform data 3 methods (smoothing)
View details of time series data with Remotte
Power of forecasting methods in time series data analysis Semi-optimization (SARIMA) [Memo]
Time series analysis related memo
Time series analysis part 4 VAR
Time series analysis Part 3 Forecast
[Python] Plot time series data
Time series analysis Part 1 Autocorrelation
Anomaly detection of time series data by LSTM (Keras)
Calculation of time series customer loyalty
Time series analysis practice sales forecast
About time series data and overfitting
A story about clustering time series data of foreign exchange
Recommendation of data analysis using MessagePack
Data handling 2 Analysis of various data formats
"Measurement Time Series Analysis of Economic and Finance Data" Solving Chapter End Problems with Python
What you should not do in the process of time series data analysis (including reflection)
How to extract features of time series data with PySpark Basics
Comparison of time series data predictions between SARIMA and Prophet models
Forecasting time series data with Simplex Projection
Time series analysis 2 Stationary, ARMA / ARIMA model
Predict time series data with neural network
I tried time series analysis! (AR model)
Time series analysis Part 2 AR / MA / ARMA
[Python] Accelerates loading of time series CSV
Time series data anomaly detection for beginners
Conversion of time data in 25 o'clock notation
How to handle time series data (implementation)
Preprocessing in machine learning 1 Data analysis process
Reading OpenFOAM time series data and sets data
Time series analysis # 6 Spurious regression and cointegration
[For beginners] Script within 10 lines (5. Resample of time series data using pandas)
measurement of time
Data analysis Titanic 2
Data analysis python
Time Series Decomposition
Data analysis Titanic 1
Plot CSV of time series data with unixtime value in Python (matplotlib)
Data analysis Titanic 3
[Kaggle] I tried feature engineering of multidimensional time series data using tsfresh.
Python: Time Series Analysis: Building a SARIMA Model
Get time series data from k-db.com in Python
Time variation analysis of black holes using python
Shortening the analysis time of Openpose using sound
Kaggle Kernel Method Summary [Table Time Series Data]
[Python] Data analysis, machine learning practice (Kaggle) -Data preprocessing-
How to read time series data in PyTorch
Sentiment analysis of large-scale tweet data by NLTK
Data cleansing 3 Use of OpenCV and preprocessing of image data
A well-prepared record of data analysis in Python
Numerical summary of data
Data analysis using xarray