This is Nishimori from Scrum Sign. This time, as a basic survey, I read a survey paper on anomaly detection of time-series data. This article is a summary.

Title Anomaly Detection of Time series Author Deepthi Cheboli Time 2010 May The link is here. https://conservancy.umn.edu/bitstream/handle/11299/92985/Cheboli_Deepthi_May2010.pdf%C2%A0?sequence=1

What is anomaly detection? Anomaly detection of time series data ・ Utilization area .Problem settings ・ With or without label ・ Data type Method ・ Conversion method ・ Detection method

What is anomaly detection?

Anomaly detection refers to the use of data mining to identify observations, unexpected patterns, etc. that do not match other data in the dataset. Anomaly in anomaly detection is a data pattern that does not conform to a concept that is clearly defined as a normal pattern. The difference in the method of defining the above normal pattern may be the difference in the method. There are several types of anomaly detection: outlier detection that finds singular values in data, change point detection that finds points where abnormal changes occur in continuous data, and whether the data is abnormal or not. Judgment, abnormal state detection, etc. In the real world, data abnormalities often reflect defects that cannot be overlooked (cardiopulmonary arrest, etc.), so this is an area of great research.

Anomaly detection of time series data

In the real world, there is a desire to record continuous values and detect defects in the real world from those values. In many cases, the defect appears in the data as some outlier, so Anomaly detection of time series data has been particularly actively studied. Compared to anomaly detection in other areas, the value itself is not an outlier, but there are many opportunities to consider anomaly (contextual anomaly) that is considered anomaly when considering the context.

・ Utilization area

There are many, but a typical one is used in systems that detect abnormalities that occur during flight from information from airplane sensors.

・ Problem setting

There are three problem settings for anomaly detection of time series data. １Contextual anomalies Detects data points that are judged to be abnormal by considering the data before and after. ２ anomalous subsequence Detects subsequences (partial intervals) that are judged to be abnormal in time series data. 3 anomalous dataset Determine if the entire given dataset is abnormal.

・ With or without label

It can be classified into three types according to the degree of normal or abnormal labeling of the training data. Supervised → Normal, abnormal, both are labeled (in reality, there are few cases where both can be labeled)

Semi-supervised → Only normal data is labeled.

Unsupervised → Neither is labeled (however, many unspervised methods assume that the number of abnormal data is a few or much smaller than that of normal data. In other words, the data set to be trained is simulated. It is regarded as normal data.)

・ Types of time series data

Time series data has two unique characteristics that should be considered when performing an analysis. 1 periodicity (whether the data has a period) 2 Synchronism (When dealing with multiple time series data, whether the data is synchronized) 周期性、同期性共にあるデータの例周期性はあるが同期生はないデータの例

Method

Two types of methods were introduced in the paper. So-called preprocessing is very important for time series data. This Transformation seems to be the part that corresponds to this preprocessing. 1 Method of transforming given time-series data into data that is easier to analyze (Transformation) 2 Method of detecting abnormalities from time series data (Detection)

Transformation

1　Aggression Description Compress the data to a better representation. PAA (piecewise aggregation apporoximation) is a typical method. It has the advantage of increasing calculation efficiency because it reduces the dimensions, but there is also the risk of hiding important features. PAA Consider converting time series data of length n into a w-dimensional vector. In the following equation, the time series data C of length n is converted into a w-dimensional vector. In other words, by dividing the data into w frames at equal intervals and averaging the data in each frame, n time series data can be reduced to w elements.

２ Signal Processing Description Analysis is performed by converting to the frequency domain using signal processing technology (Fourier transform, wavelet). The most commonly used is Haar transformation

３　Discretization Description Image of dividing the value from max to minimum, which fluctuates as a function of time, into multiple areas, assigning each data to the area to which the data belongs, and waving the alphabet The most commonly used method is SAX Click here for an explanation of SAX. https://ipsj.ixsq.nii.ac.jp/ej/index.php?action=pages_view_main&active_action=repository_action_common_download&item_id=109658&item_no=1&attribute_id=1&file_no=1&page_id=13&block_id=8

Detection

1 window　based

Description The training data is divided into n windows, and the test data is obtained by slightly shifting each window. The degree of anomaly is calculated and aggregated for each window according to the distance between the test data and the normal data. (It is premised that the concept of distance, which measures the degree of abnormality by the degree of similarity with training data, is applicable.)

merit ・ Can handle any of the above problem settings Demerit ・ Difficult to determine the width of the window ・ It is very difficult to determine the width to shift the window. (If the width is narrow, the calculation cost will be high, and if the width is wide, it will not be possible to detect anomalies occurring within the shift width.) ・ Calculation cost is very high

２ prediction based

Description Most actively studied in the field of time series data. It is assumed that normal data is generated from a stochastic process and that abnormal data does not fit into that process. (1) Train the probability model to predict the state of t + 1 from the state of time t. (2) Based on the prediction, the test data is predicted, and the error from the prediction is set as the degree of abnormality by setting a threshold value. This process is common. As a typical method MovingAverage, Autoregression, ARMA, ARIMA, Kalman Filter And so on.

merit ・ Performance is good because it is a method that easily reflects the characteristics of time series data. ・ Any problem setting can be handled. ・ High accuracy is achieved if the assumed distribution is correct. Demerit ・ In the case of autoregressive analysis, it is difficult to determine the interval to be used for prediction, as with the window based method. (If you do not set the section including the abnormality, the abnormality cannot be detected properly)

３ hidden Markov model based

Description Time series data is generated by a certain hidden time series, and the method is based on the premise that the time series is created in a Markov process. (1) Create a hidden Markov model (HMM (λ)) from the given data. For the training data (train = O1, O2 ...), find the maximum value of the probability of P (train | λ) using the Baum-Welch re-estimation procedure, and identify the HMM parameter. (2) Find P (Otest | λ) from the test data, and consider the one with low probability as abnormal.

merit ・ Can handle all problem settings Demerit -Although it is assumed that there is a hidden time series, if the time series data does not exist, abnormality detection cannot be performed well.

4 segment based

Description

The time series is decomposed into homogeneous segments, and the FSA between the segments (finite automan (a model of what kind of logical flow the logically connected elements express in a certain state)) is modeled. To learn. (In this model, the logical connection is expressed as a transition probability.) Below is a link to explain finite oatman. https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=2ahUKEwj-lMqxhvHoAhXCPXAKHRJ9Dm0QFjAAegQIAhAB&url=https%3A%2F%2Fja.wikipedia.org%2Fwiki%2F%25E6%259C%2589%25E9%2599%2590%25E3%2582%25AA%25E3%2583%25BC%25E3%2583%2588%25E3%2583%259E%25E3%2583%2588%25E3%2583%25B3&usg=AOvVaw0fOQ4xY7fgxP189BFnv09U Anomaly detection flow ① Create FSA in training data (2) Test data (X = X1, X2 ,,,, Xn) Predicted as follows Set X1 to the current state, then X2 to Xn in order (a) If it matches the current state, leave it as it is (b) Transition to the next state (possible) (c) If neither of them is found, it is judged to be abnormal.

merit -Covers all the above problem settings. Demerit -Failure if the data cannot be divided into homogeneous segments. ・ Because all points are checked in both training and testing, the calculation cost is high.

Summary

I have compiled a survey paper on time-series anomaly detection. I was reminded that the survey paper should be read first when investigating an area. As with all anomaly detection, I felt that it was very important to understand the nature of the data before choosing a detection method, and that it was the most important task for humans. I feel that this is probably true for machine learning in general. I am still in the early stages, but I would like to continue learning with what I learned in this paper as my own knowledge.

Challenges and future prospects

Due to my lack of mathematical knowledge this time, I could not dig deep into each method. One of the future prospects is to actually try the method learned this time. In particular, I think the prediction based method seems to be easy to implement, so I would like to try it. I'm still in the beginning, but I want to focus on understanding in front of me without being impatient.

[PYTHON] I have read a survey paper on time-series anomaly detection, so I will summarize it.

table of contents