Introduction to Time Series Analysis ~ Seasonal Adjustment Model ~ Implemented in R and Python

Introduction

This is the 19th day article of Gunosy Advent Calender 2015. This year is also over.

I started working at Gunosy in November and it's been a lot of fun. Since I usually do data analysis and algorithm development, this time I will briefly introduce the time series analysis used in business analysis.

What is time series analysis?

Time series analysis is an attempt to capture the fluctuation of a certain phenomenon in relation to past movements. [From "Introduction to Time Series Analysis" by Genshiro Kitagawa](http://www.amazon.co.jp/%E6%99%82%E7%B3%BB%E5%88%97%E8%A7%A3%E6 % 9E% 90% E5% 85% A5% E9% 96% 80-% E5% 8C% 97% E5% B7% 9D-% E6% BA% 90% E5% 9B% 9B% E9% 83% 8E / dp / 4000054554)

The terms "data-driven management" and "big data" are beginning to take hold, and I think many companies are making decisions to improve their products based on the data. However, data that changes daily (especially indicators called sales and KPIs) varies widely, and it can be difficult to properly grasp changes. Therefore, time-series analysis can be used to properly capture changes and make predictions with some accuracy.

Seasonally adjusted

This time, we will introduce seasonally adjusted data. Roughly speaking, time series data Observed value = trend component + seasonal component + noise component This is the model explained in.

The apps we provide are also influenced by the rhythm of human life as long as they are closely related to human life. There are roughly "month factor", "day of the week factor", and "time factor", but this time I will focus on the day of the week and implement the sample.

Implementation in R

I would like to implement it using the data of TEPCO. First, I will do it with R. First, output the raw data.

data <- read.csv("tokyo2015_day.csv", header=T) #Get data from csv
power <- data[,2] #Extract numbers
plot(power, type="l") #plot

スクリーンショット 2015-12-25 01.04.37.png

It's jagged. R has a ts function that converts data to periodic data and a stl function function that converts seasonally adjusted time series data, and you can easily create a seasonally adjusted model using these functions.

data <- read.csv("tokyo2015_day.csv", header=T) #Get data from csv
power <- data[,2] #Extract numbers
plot(power, type="l") #plot

ts <- ts(power, frequency=7) #Cycle is 7 days(1 week)
stl <- stl(ts, s.window="periodic") #Seasonally adjusted time series data creation

plot(stl, type="l") #plot

スクリーンショット 2015-12-25 01.11.41.png

The top of the four graphs is the raw data (observed values), which can be divided into trend component, seasonal component, and noise component in order from the top.

In data analysis, you can capture long-term changes by looking at trend components.

A little consideration

After all, summer and winter are high, and the influence of the day of the week seems to be large (you can see that many people live in a similar cycle). The main items for the day of the week component are as follows. Since January 1st, 2015 starts and January 1st is Thursday, you can see that the seasonal components of holidays (Saturday, Sunday) are negative.

$ print(stl$time.series[,1]) #Output seasonal components
2321.9288  1927.3324 -2517.1524 -6122.9112   293.1919  1872.1087  2225.5017...

When you see the amount of electricity used in the summer drop sharply, you can't help but think, "It's true that it has suddenly cooled down since September this year." (* I can't say anything unless I compare it with other years)

Implementation in Python

I also do it in Python. This is Jupyter. What we are doing is the same.

import csv
import datetime as datetime  
import matplotlib.pyplot as plt
import pandas as pd
from statsmodels.tsa.seasonal import seasonal_decompose
%matplotlib inline

filename = "tokyo2015_day.csv"
with open(filename, 'rt') as f:
    data = list(csv.reader(f))

headers = data.pop(0)
df = pd.DataFrame(data, columns=headers)

dataFrame = DataFrame(df['power'].values.astype(int), DatetimeIndex(start='2015-01-01', periods=len(df['power']), freq='D'))
ts = seasonal_decompose(dataFrame.values, freq=7)
plt.plot(ts.trend) #Trend component
plt.plot(ts.seasonal) #Seasonal ingredients
plt.plot(ts.resid) #Noise component

Acknowledgments

Part of the Python code was rewritten from R to Python by @moyomot.

in conclusion

The end

Reference material

[Genshiro Kitagawa "Introduction to Time Series Analysis"](http://www.amazon.co.jp/%E6%99%82%E7%B3%BB%E5%88%97%E8%A7%A3%E6% 9E% 90% E5% 85% A5% E9% 96% 80-% E5% 8C% 97% E5% B7% 9D-% E6% BA% 90% E5% 9B% 9B% E9% 83% 8E / dp / 4000054554)

Recommended Posts

Introduction to Time Series Analysis ~ Seasonal Adjustment Model ~ Implemented in R and Python
[Introduction to element decomposition] Let's arrange time series analysis methods in R and python ♬
"Introduction to data analysis by Bayesian statistical modeling starting with R and Stan" implemented in Python
Python: Time Series Analysis: Building a SARIMA Model
Python: Time Series Analysis: Stationarity, ARMA / ARIMA Model
To represent date, time, time, and seconds in Python
Python: Time Series Analysis
Convert timezoned date and time to Unixtime in Python2.7
[Introduction to Udemy Python 3 + Application] 36. How to use In and Not
How to generate exponential pulse time series data in python
Graph time series data in Python using pandas and matplotlib
Introduction to Effectiveness Verification Chapters 4 and 5 are written in Python
An introduction to statistical modeling for data analysis (Midorimoto) reading notes (in Python and Stan)
[Introduction to Python3 Day 1] Programming and Python
Python: Time Series Analysis: Preprocessing Time Series Data
Hashing data in R and Python
Introduction to image analysis opencv python
Solving AOJ's Algorithm and Introduction to Data Structures in Python -Part1-
Solving AOJ's Algorithm and Introduction to Data Structures in Python -Part2-
[Statistics] [Time series analysis] Plot the ARMA model and grasp the tendency.
Solving AOJ's Algorithm and Introduction to Data Structures in Python -Part3-
Time series analysis using a general Gaussian state-space model using Python [Implementation example considering exogenous and seasonality]
Time series analysis 2 Stationary, ARMA / ARIMA model
[Introduction to Python] How to use class in Python?
Duck book implemented in Python "Bayesian statistical modeling with Stan and R"
How to do R chartr () in Python
I tried time series analysis! (AR model)
Data analysis: Easily apply descriptive and inference statistics to CSV data in Python
Time series analysis 4 Construction of SARIMA model
Easy introduction of python3 series and OpenCV3
Adding Series to columns in python pandas
How to use is and == in Python
I implemented the VGG16 model in Keras and tried to identify CIFAR10
Time series analysis # 6 Spurious regression and cointegration
Introduction to Vectors: Linear Algebra in Python <1>
Introduction to Effectiveness Verification Chapter 1 in Python
How to stop a program in python until a specific date and time
Get time series data from k-db.com in Python
3 ways to parse time strings in python [Note]
How to generate permutations in Python and C ++
[Introduction to Python3 Day 12] Chapter 6 Objects and Classes (6.3-6.15)
Introduction to effectiveness verification Chapter 3 written in Python
tse --Introduction to Text Stream Editor in Python
Implemented memoization recursion and exploration in Python and Go
I wrote "Introduction to Effect Verification" in Python
[Introduction to Python3 Day 22] Chapter 11 Concurrency and Networking (11.1 to 11.3)
A clever way to time processing in Python
Send messages to Skype and Chatwork in Python
Survival time analysis learned in Python 2 -Kaplan-Meier estimator
[Introduction to Udemy Python3 + Application] 64. Namespace and Scope
[Introduction to minimize] Data analysis with SEIR model ♬
I tried to implement TOPIC MODEL in Python
How to read time series data in PyTorch
Introduction to Effectiveness Verification Chapter 2 Written in Python
How to plot autocorrelation and partial autocorrelation in python
Reading, summarizing, visualizing, and exporting time series data to an Excel file with Python
"Introduction to effect verification Chapter 3 Analysis using propensity score" + α is tried in Python
[SIR model analysis] Transform the formula to determine γ and the effective reproduction number R ♬
[Impression] [Data analysis starting from zero] Introduction to Python data science learned in business cases
[Introduction to Python] Summary of functions and methods that frequently appear in Python [Problem format]
[Introduction to Sound] Let's arrange the introduction to sounds of python and R ♬-Listen to the sound of the explosion of Nikkei 255-