Comparison of moving average calculation time written in Python
It is out of the question to use the for statement in Python to calculate the moving average, and the result is that it is better to use the pandas and scipy functions. However, in the above article, it was the moving average of FIR filter type such as SMA and LWMA, so this time I investigated the moving average of IIR filter type such as EMA and SMMA.
EMA EMA is an abbreviation for Exponential Moving Average and is expressed by the following formula.
Where $ \ alpha $ is a real parameter from 0 to 1. This formula does not include a parameter that represents the period, but since SMA and LWMA use the parameter that represents the period, EMA often uses the period parameter accordingly. Assuming that the period is $ p $, EMA is $ \ alpha = 2 / (p + 1) $ and SMMA is $ \ alpha = 1 / p $. Personally, I don't think it is necessary to distinguish between EMA and SMMA because they have the same formula, but I will mention them because they are used separately in MetaTrader.
First, let's implement it with pandas. The data to be processed is the same as in the previous article. It is a four-value time series of about 370,000 pieces. Since pandas is for time series data, EMA can also be easily written with the ʻewm () and
mean () `functions.
import numpy as np
import pandas as pd
dataM1 = pd.read_csv('DAT_ASCII_EURUSD_M1_2015.csv', sep=';',
names=('Time','Open','High','Low','Close', ''),
index_col='Time', parse_dates=True)
def EMA(s, ma_period):
return s.ewm(span=ma_period).mean()
%timeit MA = EMA(dataM1['Close'], 10)
Since the execution time will be compared this time as well, the measurement results will be shown.
10 loops, best of 3: 32.2 ms per loop
In the case of SMA, it was about 16 milliseconds, so it is about twice as slow.
When using scipy's lfilter ()
, it is not enough to just enter $ \ alpha $, you need to change to the IIR filter format and enter the coefficient. So, I will convert it a little. (Detailed theory is omitted. It is the basis of digital signal processing.)
Convert both sides of the EMA expression to $ z $.
If you put $ Y (z) $ on the left side
And if $ Y (z) / X (z) $ is $ H (z) $,
Can be written. This is the system function of the IIR filter. The numerator and denominator polynomial coefficients of this system function are passed to the argument of lfilter ()
.
In the case of EMA, the numerator of the system function is a constant and the denominator is a linear polynomial, so the general formula of the system function can be written as follows.
Comparing the coefficients, you can see that the coefficients of $ b $ and $ a $ are as follows.
So, if you implement EMA using scipy's lflter ()
, you can write as follows. Put the above $ b $ and $ a $ in the argument in the form of a list.
from scipy.signal import lfilter
def EMAnew(s, ma_period):
alpha = 2/(ma_period+1)
y = lfilter([alpha], [1,alpha-1], s)
return pd.Series(y, index=s.index)
%timeit MA = EMAnew(dataM1['Close'], 10)
Result is
100 loops, best of 3: 3.08 ms per loop
It became almost the same speed as the case of SMA. After all, lfilter ()
is fast.
The result is that lfilter ()
is fast this time as well, but there is a slight problem with the processing result.
In EMA, when calculating the first output $ y (0) $, $ y (-1) $ with no data is used, but in the case of pandas, it is suitable for time series data, so $ y It is processed so that $ y (-1) $ is not used as (0) = x (0) $.
pd.DataFrame({'Close':dataM1['Close'],'EMA':MA}).head(10)
Close | EMA | |
---|---|---|
Time | ||
2015-01-01 13:00:00 | 1.20962 | 1.209620 |
2015-01-01 13:01:00 | 1.20962 | 1.209620 |
2015-01-01 13:02:00 | 1.20961 | 1.209616 |
2015-01-01 13:04:00 | 1.20983 | 1.209686 |
2015-01-01 13:05:00 | 1.20988 | 1.209742 |
2015-01-01 13:06:00 | 1.20982 | 1.209762 |
2015-01-01 13:07:00 | 1.20987 | 1.209788 |
2015-01-01 13:08:00 | 1.21008 | 1.209855 |
2015-01-01 13:09:00 | 1.20996 | 1.209878 |
2015-01-01 13:10:00 | 1.20977 | 1.209855 |
In this case, the EMA result is not so different from the input time series, but in the case of lfilter ()
, it is calculated as $ y (-1) = 0 $, so the first EMA The value of will deviate considerably from the input.
Close | EMA | |
---|---|---|
Time | ||
2015-01-01 13:00:00 | 1.20962 | 0.219931 |
2015-01-01 13:01:00 | 1.20962 | 0.399874 |
2015-01-01 13:02:00 | 1.20961 | 0.547099 |
2015-01-01 13:04:00 | 1.20983 | 0.667596 |
2015-01-01 13:05:00 | 1.20988 | 0.766193 |
2015-01-01 13:06:00 | 1.20982 | 0.846852 |
2015-01-01 13:07:00 | 1.20987 | 0.912855 |
2015-01-01 13:08:00 | 1.21008 | 0.966896 |
2015-01-01 13:09:00 | 1.20996 | 1.011090 |
2015-01-01 13:10:00 | 1.20977 | 1.047213 |
It seems that this problem can be solved with the optional argument of lfilter ()
. By writing the following, I got almost the same result as pandas.
def EMAnew(s, ma_period):
alpha = 2/(ma_period+1)
y,zf = lfilter([alpha], [1,alpha-1], s, zi=[s[0]*(1-alpha)])
return pd.Series(y, index=s.index)
Here, zi
is the initial value of the state variable, so it is not just the initial value of input and output, but here, $ y (0) = \ alpha x (0) + zi = x (0) If you put a zi
that becomes $, it seems that the result will be like that.
Recommended Posts