[PYTHON] Time series analysis 1 Basics

Aidemy 2020/10/

Introduction

Hello, it is Yope! I am a liberal arts student, but I was interested in the possibilities of AI, so I went to the AI-specialized school "Aidemy" to study. I would like to share the knowledge gained here with you, and I am summarizing it on Qiita. I am very happy that many people have read the previous summary article. Thank you! This is the first post in the next series analysis. Nice to meet you.

What to learn this time ・ About time series analysis ・ Types of time series data ・ Statistic of time series data

About time series analysis

What is time series data?

-Time series data refers to data __ whose value changes with time __. For example, hourly temperature, sales, stock price, etc. are applicable. ・ In particular, it is an important analysis in business, such as product sales forecasts and store visitor forecasts. -Time series analysis is implemented using python StatsModels.

(Review) Display of time series data

-For time series analysis, it is indispensable to graph the time series data. Use Matplotlib for diagramming. Below, I will review the plt that appears this time.

-Create graph: __plt.plot (x, y) __ -Graph display: __plt.show () __ -Set the x-axis display range: __plt.xlim ([,]) __ (ylim for y-axis) -Graph title: __plt.title ("") __ -X-axis title: __plt.xlabel ("") __ (ylabel for y-axis)

Time series data pattern

-There are the following three patterns in time series data. Time series data is made up of a combination of these three patterns. · __ Trend : Long-term data trends. If the value is rising, it is called a "positive trend", and if it is decreasing, it is called a "negative trend". - Periodic fluctuation : The data value repeats rising and falling with the passage of time. In particular, the one-year periodic fluctuation is called seasonal fluctuation. - Irregular fluctuation __: The value of the data fluctuates regardless of the passage of time.

modeling

-__ Modeling __ is to formulate time series data (__ build model __). ・ Time-series analysis is the use of this model to make predictions and analyze the relationships between data.

Types of time series data

-The time series data includes __ "original series" __ which is the data itself before processing. The purpose of time series analysis is to analyze the properties of this original series, but most of the actual analysis is the data after processing. -The processed data includes __ "logarithmic series", "difference series", and "seasonally adjusted series" __. We will take a closer look at each below.

Logarithmic series

-For data with large fluctuations in value among time series data, making the change gentle is called logarithmic conversion, and data that has undergone logarithmic conversion is called logarithmic series.

-To perform logarithmic conversion, execute the following. __np.log (DataFrame data) __

Floor difference series

-Of the time series data, handling by taking the difference from the previous value is called scale difference series. -By converting to a difference series, __trends (long-term trends) can be removed __. -By removing the trend, it may be possible to make it a __stationary process __ that indicates that "the value in the time series does not change when viewed as a whole regardless of the passage of time". The stationary process will be described later.

-To perform the difference series, execute the following. __DataFrame data .diff () __

Seasonally adjusted series

・ Seasonal fluctuations are referred to as seasonal fluctuations over a year, but it is difficult to "analyze data that is not a seasonal fluctuation pattern" for data with seasonal fluctuations. In order to deal with such cases, processing to remove seasonal fluctuations may be performed, and the data obtained by this processing is called seasonally adjusted series.

-To perform a seasonally adjusted series, do the following: (Sm stands for stats models) __sm.tsa.seasonal_decompose (data) __

・ Code![Screenshot 2020-10-29 13.23.57.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/698700/0cc42af4-cdae-2551- b768-f33da02e8d90.png)

・ Result![Screenshot 2020-10-29 13.24.06.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/698700/80c2e264-d78c-b4e9- 769f-8f0e7d8322fd.png)

Time series data statistics

Expected value (average)

-The average value of each time series data is called Expectation. -The average value can be calculated by __np.mean () __.

Variance / standard deviation

-The value indicating how much each time series data deviates from the expected value is variance. -The variance is calculated by __ (each data-expected value) ^ 2 __, and this square root is called __ standard deviation __. ・ In the world of stocks and investment, standard time deviation is an important index for risk measurement.

Autocovariance / autocorrelation coefficient

-__ Autocovariance __ refers to __ covariance of the same data in different time series. ・ When the time series are separated by k, it is called the k-th order autocovariance, and is calculated as follows. __ (each data-expected value) (data separated by k-expected value) __ -The above equation viewed as a function for k is called __autocovariance function __.

-The __autocorrelation coefficient __ is a conversion of this autocovariance so that it can be compared with different values. -The autocorrelation coefficient indicates __ how similar it is to the past value __. ・ The autocorrelation coefficient is calculated as follows. __ Autocovariance / (Standard deviation of data) (Standard deviation of data separated by k) ___ -The above formula seen as a function for k is called __autocorrelation function __, and the graph of this is called correlogram.

Autocorrelation function output

・ The autocorrelation function It is represented by __sm.tsa.stattools.acf (data, nlags) __. ・ The graph (correlogram) is It is represented by __sm.graphics.tsa.plot_acf (data, lags) __.

-The argument "lag" is __ "the value of the shifted time series k" __. ・ It can be said that the closer the correlation coefficient is to 1.0, the stronger the positive correlation, and the closer it is to -1.0, the stronger the negative correlation.

・ Code![Screenshot 2020-10-29 13.23.02.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/698700/a61008fc-e9ec-3c69- dd58-a5f40dae509f.png)

・ Result![Screenshot 2020-10-29 13.23.25.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/698700/32d7d822-abad-bae7- 25f4-86cc625b2c80.png)

Summary

-Time-series data refers to data whose value changes over time. -There are three patterns in time series data. The long-term tendency of the value to rise and fall is the "trend", the repeated rise and fall of the value with the passage of time is the "periodic fluctuation", and the value changes with the passage of time is the "irregular fluctuation". -Statistics for time series analysis include expected value (mean), variance, and standard deviation. In addition, there are autocovariance and autocorrelation coefficient calculated using these, and by calculating the autocorrelation coefficient, the similarity of the data at that time with the past can be understood.

This time is over. Thank you for reading until the end.

Recommended Posts

Time series analysis 1 Basics
RNN_LSTM1 Time series analysis
Time series analysis related memo
Time series analysis part 4 VAR
Time series analysis Part 3 Forecast
Time series analysis Part 1 Autocorrelation
Python: Time Series Analysis: Preprocessing Time Series Data
Time series analysis practice sales forecast
Time series analysis 3 Preprocessing of time series data
Time series analysis 2 Stationary, ARMA / ARIMA model
I tried time series analysis! (AR model)
Time series analysis Part 2 AR / MA / ARMA
Time series analysis 4 Construction of SARIMA model
Time series analysis # 6 Spurious regression and cointegration
Python: Time Series Analysis: Building a SARIMA Model
Python: Time Series Analysis: Stationarity, ARMA / ARIMA Model
Basics of regression analysis
Display TOPIX time series
Time series plot / Matplotlib
Python 3.4 Create Windows7-64bit environment (for financial time series analysis)
I implemented "Basics of Time Series Analysis and State Space Model" (Hayamoto) with pystan
[Python] Plot time series data
Challenge to future sales forecast: ② Time series analysis using PyFlux
A study method for beginners to learn time series analysis
[Understand in the shortest time] Python basics for data analysis
Challenge to future sales forecast: ⑤ Time series analysis by Prophet
Challenges for future sales forecasts: (1) What is time series analysis?
Calculation of time series customer loyalty
How to extract features of time series data with PySpark Basics
Time series plot started ~ python edition ~
About time series data and overfitting
[Statistics] [Time series analysis] Plot the ARMA model and grasp the tendency.
Differentiation of time series data (discrete)
Movement statistics for time series forecasting
LSTM (1) for time series forecasting (for beginners)
Power of forecasting methods in time series data analysis Semi-optimization (SARIMA) [Memo]
Instantly illustrate the predominant period in time series data using spectrum analysis
Forecasting time series data with Simplex Projection
Predict time series data with neural network
How to compare time series data-Derivative DTW, DTW-
[Python] Accelerates loading of time series CSV
Time series data anomaly detection for beginners
matplotlib Write text to time series graph
How to handle time series data (implementation)
Reading OpenFOAM time series data and sets data
Introduction to Time Series Analysis ~ Seasonal Adjustment Model ~ Implemented in R and Python
Challenge to future sales forecast: ④ Time series analysis considering seasonality by Stats Models