[PYTHON] How to compare time series data-Derivative DTW, DTW-

wrap up

Purpose: Measure the similarity of different time series data
Keywords: Dynamic Time Warping (DTW), Derivative Dynamic Time Warping (DDTW)
Reference 1: Derivative DTW ~ Method of calculating similarity of time series pattern ~ by Tech Blog
Reference 2: [DTW (Dynamic Time Warping) Dynamic Time Warping by S-Analysis](https://data-analysis-stats.jp/%E6%A9%9F%E6%A2%B0%E5% AD% A6% E7% BF% 92 / dtwdynamic-time-warping% E5% 8B% 95% E7% 9A% 84% E6% 99% 82% E9% 96% 93% E4% BC% B8% E7% B8% AE% E6% B3% 95 /)
Reference 3: Implementation of DTW (Dynamic Time Stretching Method) by ksknw
Reference 4: How to use fastdtw

DTW

How to find DTW

Create a (cost) matrix that calculates the distance between each point in two time series (absolute value this time, depending on the dimension of the data)

\begin{align}
time\ series\ T&:t_0,t_1,...,t_N,\\
time\ series\ S&:s_0,s_1,...,s_M,\\
\boldsymbol{W}&:=(\delta(i,j)) ,\\
where\ \delta(t_i,s_j)&:=|t_i-s_j|,i \in \{0,1,...,N\},j\in\{0,1,...,M\} 
\end{align}

Consider a path that passes from (0,0) to (N, M) on this matrix and satisfies the monotonic increase.

\begin{align}
path\ \tau :\tau(0)=(0,0),\tau(1),...,\tau(p)=(p_i,p_j),...,\tau(K),\\
where\ p_i \in \{0,1,...,N\},p_j\in\{0,1,...,M\}.
\end{align}

In particular, consider something that satisfies the monotonic increase in this $ path \ \ tau $.

```math
\tau(p) \leq \tau(q) \Longleftrightarrow^{def} p_i\leq q_i\ or\ p_j\leq q_j.\\
if\ p\leq q,then\ \tau(p) \leq \tau(q).
```

Let the DTW cost be the smallest value in the sum of the elements of the matrix $ \ boldsymbol {W} $ that this $ path $ passes through.
```
\begin{align}
cost(\tau)&:=\sum_{p}\delta(p_i,p_j)\\
Dist(T,S)&:=min_{\tau}cost(\tau)
\end{align}
```

The figure below is from Derivative Dynamic Time Warping by Eamonn

It's different from the official formula, but it looks like this. Test code: Reference 3

#Preparation
pip install fastdtw # <=pip really just needs this one
pip install numpy
pip install scipy

##code##
import numpy as np
from scipy.spatial.distance import euclidean
from fastdtw import fastdtw
A = np.sin(np.array(range(T)) / 10)
B = np.sin((np.array(range(T)) / 10 + t * np.pi))
C = np.zeros((T))
distance, path = fastdtw(A, B, dist=euclidean)
print(distance)
plt.plot(A)
plt.plot(B)
for i, j in path:
   plt.plot([i, j], [A[i], B[j]],color='gray', linestyle='dotted')
plt.legend(["sin(θ)", "sin(θ+150*pi)"], fontsize=10, loc=2)
plt.show()

##result##
sin(θ)-sin(θ+150*pi): 0.6639470476737607
sin(θ)-constant: 0.5150026855435942
DTW(sin(θ), sin(θ+150*pi)): 16.720461687388624
DTW(sin(θ), constant): 97.26964355198544

Benefits of DTW

Similarity can be obtained even if the lengths and cycles of time series are different.

Disadvantages of DTW

Intuitive alignment is performed for time series data that has locally accelerated and decelerated parts on the time axis: Reference: Derivative Dynamic Time Warping by Eamonn, etc. For example, when one of them becomes s high, DTW ignores the data shape and aligns, so the data point of s high is associated with the data point before s high.

DDTW

In response to DTW, the degree of similarity is measured by adding information that captures the shape. Actually, after processing the time series data $ time \ series \ T: t_0, t_1, ..., t_N $ as follows, DTW:

T^*:t^*_0,t^*_1,...,t^*_N,\\
where\ t^*_i :=  \frac{t_i-t_{i-1}+\frac{t_{i+1}-t_{i-1}}{2}}{2}=  \frac{t_i-t_{i-1}+\frac{(t_{i+1}-t_i)+(t_i-t_{i-1})}{2}}{2}

This process is the average of the left derivative $ t_i-t_ {i-1} $ and the right derivative $ t_ {i + 1} -t_i $ and the average of the left derivative with respect to the point of interest $ t_i $. .. DDTW is the DTW for this processing.

Benefits of DDTW

Similarity can be calculated considering shapes such as uptrends and downtrends.

Disadvantages of DDTW

Is the average slope good?

code

from concurrent.futures import ProcessPoolExecutor
from concurrent.futures import ThreadPoolExecutor
import time
import numpy as np
import pylab as plt
import seaborn as sns
from fastdtw import fastdtw
from scipy.spatial.distance import euclidean


def get_test_curves(view=False):
    T = 150
    t = .4

    A = np.sin(np.array(range(T)) / 10)
    B = np.sin((np.array(range(T)) / 10 + t * np.pi))
    C = np.zeros((T))
    if view:
        plt.plot(A)
        plt.plot(B)
        plt.plot(C)
        plt.legend(['sin(θ)', 'sin(θ+150*pi)', 'constant'], fontsize=10, loc=2)
        plt.show()

    return {'name': 'sin(θ)', 'data': A}, {'name': 'sin(θ+150*pi)', 'data': B}, {'name': 'constant', 'data': C}


def mse(a, b):
    return ((a-b)**2).mean()


def get_DWT_results(T, S, skip=1, view=False):
    T_data, S_data = T['data'], S['data']
    T_name, S_name = T['name'], S['name']
    distance, path = fastdtw(T_data, S_data, dist=euclidean)
    print("DTW({}, {}):".format(T_name, S_name), distance)
    if view:
        plt.plot(T_data)
        plt.plot(S_data)
        k = -1
        for i, j in path:
            k += 1
            if k % skip == 0:
                plt.plot([i, j], [T_data[i], S_data[j]],
                         color='gray', linestyle='dotted')
        plt.legend([T_name, S_name], fontsize=10, loc=2)
        plt.title('DWT plot result')
        plt.show()

    return distance, path


def get_derivative(T):
    diff = np.diff(T)
    next_diff = np.append(np.delete(diff, 0), 0)
    avg = (next_diff + diff) / 2
    avg += diff
    avg /= 2
    return np.delete(avg, -1)


def get_DDWT_results(T, S, skip=1, view=False):
    T_data, S_data = T['data'], S['data']
    dT_data = get_derivative(T_data)
    dS_data = get_derivative(S_data)
    T_name, S_name = T['name'], S['name']
    distance, path = fastdtw(dT_data, dS_data, dist=euclidean)
    print("DDTW({}, {}):".format(T_name, S_name), distance)
    if view:
        plt.plot(T_data)
        plt.plot(S_data)
        k = -1
        for i, j in path:
            k += 1
            if k % skip == 0:
                plt.plot([i+1, j+1], [T_data[i+1], S_data[j+1]],
                         color='gray', linestyle='dotted')
        plt.legend([T_name, S_name], fontsize=10, loc=2)
        plt.title('DDWT plot result')
        plt.show()

    return distance, path


def get_test_curves_DDTWvsDWT(view=False):
    T = 150
    t = .4
    A = np.zeros((T))
    B = np.zeros((T))
    # A = np.sin(np.array(range(T)) / 10)
    # B = np.sin(np.array(range(T)) / 10+2)+50
    s_i = 50
    e_i = 60
    for i in range(s_i, e_i, 1):
        A[i] = np.sin(np.pi*(i-s_i)/(e_i-s_i))
    #     B[i] = -2.2
    if view:
        plt.plot(A)
        plt.plot(B)
        plt.legend(['sin(θ)', 'sin(θ+150*pi)'], fontsize=10, loc=2)
        plt.show()

    return {'name': 'down', 'data': A}, {'name': 'up', 'data': B}


def main():
    print("=== main ===")
    # A, B, C = get_test_curves()
    A, B = get_test_curves_DDTWvsDWT()

    # A["data"] = np.array([2, 0, 1, 1, 2, 4, 2, 1, 2, 0])
    # B["data"] = np.array([1, 1, 2, 4, 2, 1, 2, 0])

    print("{}-{}:".format(A['name'], B['name']), mse(A['data'], B['data']))
    # print("{}-{}:".format(A['name'], C['name']), mse(A['data'], C['data']))
    #Calculate DTW
    get_DWT_results(A, B, skip=5)
    get_DDWT_results(A, B, skip=5)


if __name__ == "__main__":
    main()

Applied article

Clustering of stock price fluctuations of 2,940 Japanese companies in DTW (Dynamic Time Warping) Dynamic Time Warping by Kimiaki Shirahama Compare the difference in results using DTW and DDTW. The data used is "Thomson Reuters Data Stream". We conclude by qualitative evaluation (visually) that DTW + DDTW is the best similarity measure for classifying and analyzing stock price data.

Reference 3 is quite helpful with both code and experiments. Applies to temperature data.

Also, there is something called k-shape. I haven't read this, but I use distance as a cross-correlation and cluster using k-means.

Impression

I actually clustered the time series data using DTW / DDTW, but it didn't work. The cause is that I don't understand much. If you have any suggestions, please do not hesitate to contact us.

[PYTHON] How to compare time series data-Derivative DTW, DTW-

wrap up

** How to find DTW **

** Benefits of DTW **

** Disadvantages of DTW **

** Benefits of DDTW **

** Disadvantages of DDTW **

** Applied article **

** Impression **