# [PYTHON] Time series analysis 3 Preprocessing of time series data

Aidemy　2020/10/29

# Introduction

Hello, it is Yope! I am a liberal arts student, but I was interested in the possibilities of AI, so I went to the AI-specialized school "Aidemy" to study. I would like to share the knowledge gained here with you, and I am summarizing it on Qiita. I am very happy that many people have read the previous summary article. Thank you! This is the third post in time series analysis. Nice to meet you.

What to learn this time ・ Handle time series data with pandas ・ How to make time series data stationary

# Handle time series data with pandas

-Although the ultimate goal is to analyze time-series data with SARIMA, it is necessary to perform some preprocessing on the data passed at this time. -If time series data is given as a CSV file, reading will be done. Use __pd.read_csv ("file path") __.

## Convert time information to index

-When analyzing time-series data, convert time information (Hour, Month, etc.) into an index to make it easier to handle. -The conversion procedure is as follows. (1) Define index information with __pd.date_range ("start", "end", freq = "interval") __. (2) Substitute the defined information in the index of the original data. ③ Delete the time information of the original data.

-At the start and end of the original data entered in ①, the interval can be confirmed with __df.head () __ and __df.tail () __. -As for the interval, if the data is composed in seconds, pass "S", minutes for "min", hours for "H", days for "D", and months for "M".

-Code![Screenshot 2020-10-29 14.00.33.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/698700/8846bc71-71c3-998c- 537b-cd65c3c57872.png)

・ Result![Screenshot 2020-10-29 14.00.19.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/698700/efb179c4-e69c-3c49- 6315-bde299bfbfc4.png)

## Causes when data is not stationary

-The causes when time-series data does not have constantity include __ "trend" __ and "__ seasonal variation" __. -The expected value should be constant when there is stationarity, but if there is a __positive trend, it means that the expected value is on an upward trend __, so it cannot be said that there is stationarity. -Similarly, the autocorrelation coefficient should be constant when there is stationarity, but the autocorrelation coefficient is constant for data with seasonal fluctuations, that is, data in which the value suddenly increases or decreases only for a period of the year. It cannot be said that.

-In such a case, it is possible to obtain stationary data by performing __trend and conversion __ that removes seasonal fluctuations. -After creating a model with this steady-state data, the trend and seasonal fluctuations are combined again to build a model of the original series.

# Make time series data stationary

## Elimination of trends and seasonal fluctuations

-The following four methods can be mentioned to eliminate trends and seasonal fluctuations and to maintain stationarity. Details will be described later. ・ Uniform variance of fluctuation with __logarithmic transformation __ ・ Take __moving average __ to estimate the trend and remove it ・ Convert to __staff series __ (general) ・ Perform __seasonally adjusted __

## Logarithmic transformation

-As seen in "Time Series Analysis 1", the change in data value can be moderated by performing __logarithmic conversion __. -By using this, the autocovariance can be made uniform for data with sudden changes in values such as seasonal fluctuations. That is, __seasonal fluctuations can be removed __. -However, __trend cannot be removed by this method __, so it is necessary to perform processing to remove the trend in addition to logarithmic conversion. -Logarithmic conversion can be done with __np.log (data) __.

・ Result![Screenshot 2020-10-29 14.03.26.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/698700/447705c8-b9fd-4df5- 94c1-c04840f6dc45.png)

## moving average

-__ Moving average __ is __ to take the average of a certain section while moving the section __. -The moving average allows the data to be smoothed while retaining the characteristics of the original data. This makes it possible to remove __seasonal fluctuations and extract trends __. -As an example, when monthly data has seasonal fluctuations, seasonal fluctuations can be removed by taking 12 moving averages. The extracted trend can also be removed by "(original series)-(moving average)".

・ The moving average can be calculated as follows. __Data .rolling (window = number of moving averages) .mean () __

-Code (CO2 concentration data, moving average every 51 weeks (1 year)) -northeast-1.amazonaws.com/0/698700/3acdfc2c-6d10-a226-0f5c-740246fdfd67.png)

・ Result![Screenshot 2020-10-29 14.08.35.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/698700/b1662775-9436-06d6- 173b-78733ceaf6c2.png)

## Floor difference series

-As seen in "Time Series Analysis 1", handling data by taking the difference from the previous value is called scale difference series. -Trends and seasonal fluctuations can be eliminated by using a difference series. It is the most common way to maintain stationarity because it is easy to do. -To find the difference series, you can find it with __data.diff () __. ・ The one that obtains the first-order difference series is called the primary difference series, and the one that obtains the difference series of the primary difference series is called the secondary difference series.

・ Result![Screenshot 2020-10-29 14.09.30.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/698700/9f8ed0bc-d3a3-543f- b80f-8b0a7a17d56f.png)