Prerequisite articles

Comparison of moving average calculation time written in Python

It is out of the question to use the for statement in Python to calculate the moving average, and the result is that it is better to use the pandas and scipy functions. However, in the above article, it was the moving average of FIR filter type such as SMA and LWMA, so this time I investigated the moving average of IIR filter type such as EMA and SMMA.

EMA EMA is an abbreviation for Exponential Moving Average and is expressed by the following formula.

y(n)=\alpha x(n)+(1-\alpha)y(n-1)

Where $ \ alpha $ is a real parameter from 0 to 1. This formula does not include a parameter that represents the period, but since SMA and LWMA use the parameter that represents the period, EMA often uses the period parameter accordingly. Assuming that the period is $ p $, EMA is $ \ alpha = 2 / (p + 1) $ and SMMA is $ \ alpha = 1 / p $. Personally, I don't think it is necessary to distinguish between EMA and SMMA because they have the same formula, but I will mention them because they are used separately in MetaTrader.

Implemented with pandas

First, let's implement it with pandas. The data to be processed is the same as in the previous article. It is a four-value time series of about 370,000 pieces. Since pandas is for time series data, EMA can also be easily written with the ʻewm () and mean () `functions.

import numpy as np
import pandas as pd
dataM1 = pd.read_csv('DAT_ASCII_EURUSD_M1_2015.csv', sep=';',
                     names=('Time','Open','High','Low','Close', ''),
                     index_col='Time', parse_dates=True)

def EMA(s, ma_period):
    return s.ewm(span=ma_period).mean()

%timeit MA = EMA(dataM1['Close'], 10)

Since the execution time will be compared this time as well, the measurement results will be shown.

10 loops, best of 3: 32.2 ms per loop

In the case of SMA, it was about 16 milliseconds, so it is about twice as slow.

A little conversion before implementing with scipy

When using scipy's lfilter (), it is not enough to just enter $ \ alpha $, you need to change to the IIR filter format and enter the coefficient. So, I will convert it a little. (Detailed theory is omitted. It is the basis of digital signal processing.)

Convert both sides of the EMA expression to $ z $.

Y(z)=\alpha X(z)+(1-\alpha)z^{-1}Y(z)

If you put $ Y (z) $ on the left side

\\{1-(1-\alpha)z^{-1}\\}Y(z)=\alpha X(z)

And if $ Y (z) / X (z) $ is $ H (z) $,

H(z)=\frac{\alpha}{1-(1-\alpha)z^{-1}}

Can be written. This is the system function of the IIR filter. The numerator and denominator polynomial coefficients of this system function are passed to the argument of lfilter ().

In the case of EMA, the numerator of the system function is a constant and the denominator is a linear polynomial, so the general formula of the system function can be written as follows.

H(z)=\frac{b_0}{a_0+a_1z^{-1}}

Comparing the coefficients, you can see that the coefficients of $ b $ and $ a $ are as follows.

b_0=\alpha, \ a_0=1, \ a_1=\alpha-1

Implemented with scipy

So, if you implement EMA using scipy's lflter (), you can write as follows. Put the above $ b $ and $ a $ in the argument in the form of a list.

from scipy.signal import lfilter
def EMAnew(s, ma_period):
    alpha = 2/(ma_period+1)
    y = lfilter([alpha], [1,alpha-1], s)
    return pd.Series(y, index=s.index)

%timeit MA = EMAnew(dataM1['Close'], 10)

Result is

100 loops, best of 3: 3.08 ms per loop

It became almost the same speed as the case of SMA. After all, lfilter () is fast.

Adjusting the initial conditions of lfilter ()

The result is that lfilter () is fast this time as well, but there is a slight problem with the processing result.

In EMA, when calculating the first output $ y (0) $, $ y (-1) $ with no data is used, but in the case of pandas, it is suitable for time series data, so $ y It is processed so that $ y (-1) $ is not used as (0) = x (0) $.

pd.DataFrame({'Close':dataM1['Close'],'EMA':MA}).head(10)

	Close	EMA
Time
2015-01-01 13:00:00	1.20962	1.209620
2015-01-01 13:01:00	1.20962	1.209620
2015-01-01 13:02:00	1.20961	1.209616
2015-01-01 13:04:00	1.20983	1.209686
2015-01-01 13:05:00	1.20988	1.209742
2015-01-01 13:06:00	1.20982	1.209762
2015-01-01 13:07:00	1.20987	1.209788
2015-01-01 13:08:00	1.21008	1.209855
2015-01-01 13:09:00	1.20996	1.209878
2015-01-01 13:10:00	1.20977	1.209855

In this case, the EMA result is not so different from the input time series, but in the case of lfilter (), it is calculated as $ y (-1) = 0 $, so the first EMA The value of will deviate considerably from the input.

	Close	EMA
Time
2015-01-01 13:00:00	1.20962	0.219931
2015-01-01 13:01:00	1.20962	0.399874
2015-01-01 13:02:00	1.20961	0.547099
2015-01-01 13:04:00	1.20983	0.667596
2015-01-01 13:05:00	1.20988	0.766193
2015-01-01 13:06:00	1.20982	0.846852
2015-01-01 13:07:00	1.20987	0.912855
2015-01-01 13:08:00	1.21008	0.966896
2015-01-01 13:09:00	1.20996	1.011090
2015-01-01 13:10:00	1.20977	1.047213

It seems that this problem can be solved with the optional argument of lfilter (). By writing the following, I got almost the same result as pandas.

def EMAnew(s, ma_period):
    alpha = 2/(ma_period+1)
    y,zf = lfilter([alpha], [1,alpha-1], s, zi=[s[0]*(1-alpha)])
    return pd.Series(y, index=s.index)

Here, zi is the initial value of the state variable, so it is not just the initial value of input and output, but here, $ y (0) = \ alpha x (0) + zi = x (0) If you put a zi that becomes $, it seems that the result will be like that.

[PYTHON] I compared the moving average of IIR filter type with pandas and scipy

Prerequisite articles

Implemented with pandas

A little conversion before implementing with scipy

Implemented with scipy

Adjusting the initial conditions of lfilter ()