[PYTHON] Anomaly detection introduction 3 Change point detection

Aidemy 2020/11/10

Introduction

Hello, it is Yope! I am a liberal arts student, but I was interested in the possibilities of AI, so I went to the AI-specialized school "Aidemy" to study. I would like to share the knowledge gained here with you, and I am summarizing it on Qiita. I am very happy that many people have read the previous summary article. Thank you! This is the third post of anomaly detection. Nice to meet you.

What to learn this time ・ Change detection using cumulative sum method ・ Abnormal part detection by neighbor method ・ Singular spectrum conversion method

Change detection by cumulative sum method

What is cumulative sum method?

-__ Cumulative sum method __ is a method of __ "change detection" __ that detects __change point __ (see Chapter 1). Unlike the detection of outliers, change detection causes __ "continuous anomaly" __, so the cumulative sum method that can detect this is used. -In the cumulative sum method, the "abnormal state (change degree)" is counted __ along the __ time, and when the count (cumulative sum) exceeds the threshold value, it is determined to be abnormal. In other words, it can be said that this method is suitable when detecting an abnormality of some value in time series data.

-The flow of the cumulative sum method (when the change is upside, that is, when it is abnormal to rise) is as follows. (1) Define the degree of change by giving a guideline for normal times and a guideline for upside. (2) Find the __upper cumulative sum of the degree of change __ ③ If the threshold is exceeded, it is judged to be abnormal.

(1) Define the degree of change by giving a guideline for normal times and a guideline for upside.

-__Change degree __ is a value indicating how much the data has changed to an abnormal state. ・ Here, the degree of change is defined, but for that, __ "value to be taken when normal (μ)" and "upside value to be judged as abnormal (ν +)" __ are set in advance (past cases). Need to get (analyze). -Also, find __ "standard deviation (σ)" __ from the data. __ Obtained by "np.std ()" __. ・ Once calculated so far, the __change degree (a + (t)) __ at time t can be calculated as follows. $ a_{+}(t) = \left( \frac{\nu _{+}}{\sigma} \right)\frac{{ x(t)} - \mu - \nu _{+} / 2}{\sigma} $ (The formula __ "x (t)" __ is the observed value at time t)

-As a result, when the data is normal, the degree of change becomes negative, and when the data is abnormal, it becomes positive, so that it is possible to process with the cumulative sum. -Also, this time we considered the time when the change was upside, but the same calculation can be done when the change is downside.

・ Code (calculate the degree of change as 10 for normal and 14 for upside) スクリーンショット 2020-11-05 22.33.06.png

·result スクリーンショット 2020-11-05 22.33.56.png

(2) Find the upper cumulative sum of the degree of change

・ Since the degree of change was calculated in the previous section, it will be accumulated next. In the degree of change defined in the previous section, it takes a negative value when it is normal, but when accumulating, it should be __ when it is normal __. In other words, the degree of change increases only when __ is abnormal __. This is the __upper cumulative sum __. -Calculation of the upper cumulative sum is performed by taking out the degree of change one by one using for loop and adding them together. Specifically, it is as follows.

スクリーンショット 2020-11-06 11.11.57.png

・ Explanation of the above code -In the __ "range (x.size --1)" __ part of the for statement, __ "-1" __ is the upper cumulative sum up to time t is __ "t-1" Since it is calculated by "the total degree of change of + the degree of change of t" , it is extracted in this way to express this "t-1". - "Max (0, x_cumsum [i])" __ indicates that the array that stores the cumulative sum is __ "stores 0 and the one with the larger degree of change" __. ・ __ "x_cumsum [i + 1] = x_cumsum [i] + x [i + 1]" __ "i + 1" __ is actually __ because __i represents t-1 Indicates that it is "t".

・ The graph of the upper cumulative sum is as follows. スクリーンショット 2020-11-06 11.21.22.png

③ If the threshold is exceeded, it is judged to be abnormal.

-The rest is the same as the conventional method, and it is sufficient to compare the threshold value and the cumulative sum to determine whether or not it is abnormal. -Also, in order to find the point that exceeds the threshold value for the first time, that is, the "change point", it is sufficient to find __ "the index that pred == 1 for the first time between 0 and the magnitude of the cumulative sum" __. , Can be described as follows.

np.arange(score_cumsum.size)[pred==1][0]

・ "Np.arange limits the range to the size of 0 ~ cumulative sum and refers to each one, and __ [pred == 1] __ extracts the part judged to be abnormal for the first time, and __ [0 ] __ to retrieve the index "is done.

・ The following is what I did about the abnormality. スクリーンショット 2020-11-06 11.40.03.png

·result スクリーンショット 2020-11-06 11.40.13.png

Abnormal site detection by neighbor method

Sliding window, partial time series

-This time, the time series data is detected by __ "outlier detection" __. In the cumulative sum method up to the previous section, you had to prepare the threshold value and parameters for calculation of change points by yourself, which had the disadvantage of __ not very practical __, but using the __neighborhood method __ So, this problem can be solved. -Outlier detection is not suitable for detecting changes in time-series data, but it is because __continuous data such as time-series data cannot be observed. In this neighbor method, this is made possible by __developing __ to solve such a problem. -Specifically, it is possible to consider the time effect by grouping each data as M adjacent data, that is, converting it into a collection of M-dimensional vectors. -At this time, the length divided into "M" pieces is called __ "sliding window" __, and the set of vectors created by this is called __ "partial time series data" __. For example, in the following, data is converted using a sliding window of length 3.

スクリーンショット 2020-11-06 11.59.13.png

-The code is as follows. After converting the data from 1D to 2D, divide the data for the number of slide windows M and create it by __extracting __.

・ Code (converts 550 data x to partial time series data) スクリーンショット 2020-11-06 14.09.51.png

-When the above is executed, the length of the data in the partial time series will be __ "((number of data --M + 1), M)" __.

Abnormality calculation

・ After creating the partial time series data, calculate the degree of abnormality. This time, we will use __ "nearest neighbor method" __ with the distance to the nearest neighbor as the degree of anomaly. See Chapter 2 for details. Use KNeiborsClassifier to calculate the distance __ to the nearest neighbor __. -Set __ "n_neighbors" __ this time to _2 instead of 1. This is because the nearest neighbor of the data is treated as __ "the data itself" __. Therefore, the nearest neighbor distance is __ "distance of the second closest data" __. -Therefore, __ "clf.kneighbors ()" __ can be used to calculate the neighborhood distance of each data, and __ "dist [:, 1]" __ can be used to obtain the distance of the second closest data. ..

·code スクリーンショット 2020-11-06 14.38.01.png

Threshold setting, abnormality judgment

・ Next, set the threshold value. Since this method does not define the distribution of data, the threshold is set in the same way as ___1 class SVM. -Specifically, use __ "st.scoreatpercentile ()" __. __ Assuming that "the data has abnormal data at a certain rate" __, set the rate __ "a" __ and set the threshold value by passing the degree of abnormality "distances". -This time, we want to say that the larger the degree of abnormality, the more abnormal it is, so the quantile passed to the argument is __ "100-a" __.

・ Code (when the top 30% is the quantile)![Screenshot 2020-11-06 14.52.20.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws. com / 0/698700/467fc634-211b-a03c-5d62-f8989f0edf3c.png)

Singular spectrum conversion method

History matrix and test matrix

・ By separating the past and present from the partial time series data up to the previous section, it has become possible to consider more advanced distributions. The __ "singular spectrum transformation method" __ that we will learn this time finds two things, __ "vectors that represent current data and data that represents past data" __ by defining this advanced distribution. This is a method of detecting anomalies by using the difference between the two as the change point. -The flow of the singular spectrum conversion method is as follows. (1) Give window width M, number of rows in history matrix n, number of columns in test matrix k, and lag L for time series data. (2) Convert data to __partial time series data of window width M __ ③ Create __history matrix __ and test matrix ④ The history matrix and the test matrix are decomposed by __ singular value __, and each __ left singular vector matrix __ is obtained. ⑤ _ Calculate the degree of change from the difference between the two matrices __

-The __ "history matrix" __ that appears above is "a collection of n data from the previous __time __". In addition, __ "test matrix" __ is "a data __L (lag) advanced from the present time, and data __ up to k pieces before". See below visually.

スクリーンショット 2020-11-06 15.16.20.png

・ For ①, __ "M = 50, n = 25, k = 25, L = 13" __ will be used this time. -For (2), conversion to partial time series data is performed in the same way as the method used in __ "Detection of abnormal parts by the neighbor method" __. ・ For ③, refer to the following for how to create the history matrix and test matrix from the partial time series data (X_pts) created in ②.

スクリーンショット 2020-11-06 15.26.38.png

(4) Singular value decomposition of the history matrix and test matrix to obtain the matrix of each left singular vector.

-By performing __ "Low-layer approximation of singular value decomposition" __ for the history matrix and test matrix created in (3), it is possible to find the value that represents the __matrix. -As a code, when decomposing the matrix A of (m × n) into singular values, use __ "np.linalg.svd (A)" __. Prepare three variables to be stored: __ "U, S, V" __. This time, we will use __ "U" __. U contains a __left singular vector __. The left singular vector is a typical vector __ of the __matrix A, and U is a matrix arranged in descending order of priority, so the __layer "r" is as follows. By specifying with __, you can retrieve the representative vector with higher priority. -Also, the shape of U is __ (the number of rows in the matrix, the number of windows M, the number of left singular vectors) __, so when making a low-layer approximation, the __th axis is sliced. __.

スクリーンショット 2020-11-06 15.46.19.png

-Code that did these ([0] part extracts only "U") スクリーンショット 2020-11-06 15.49.42.png

⑤ Calculate the degree of change from the difference between the two matrices

・ Find the difference __ of the two __left singular vectors created in the previous section. The difference is calculated by the dot product __ of __vectors. This time, the set of vectors, that is, the inner product of the matrices is calculated by __ "np.matmul ()" __, and the distance is calculated by __ "np.linalg.norm ()" __. ・ If this distance is norm, the degree of change can be calculated by __ "1-norm" __.

-The code that does these is as follows. スクリーンショット 2020-11-07 9.55.34.png

・ Result![Screenshot 2020-11-07 9.55.56.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/698700/2084bdb3-25c6-142a- 2189-2e54669d77e8.png)

・ Supplementary information on the above code -__ "Get_score" __ creates a function to find the degree of change by yourself. In __np.matmul () __, it is necessary to transpose the __history matrix like __ "x.T" . - "Score" __ is passing U_hist and U_test together to the above get_score. The __for notation __ is used because __ the degree of change __ at each time is calculated.

Summary

-There is a "cumulative sum method" as a method of detecting change points with a time series. This is a method of accumulating (adding) the degree of change and determining that it is abnormal when it exceeds the threshold value. However, this threshold value and calculation parameters must be calculated by oneself, which is not practical. -The method using the "neighborhood method" solves this problem. This method determines whether or not the time series data is abnormal by detecting outliers. Originally, outlier detection and time series data are inconsistent, but this is resolved by dividing adjacent data into M pieces. -The length divided into M pieces at this time is called a slide window, and the collection of created data is called partial time series data. Using these, the degree of anomaly is calculated by the nearest neighbor method. -There is also a "singular spectrum conversion method" that applies the above method. This is a method of detecting a change point from "the current representative data or the difference between the representative data".

This time is over. Thank you for reading until the end.

Recommended Posts

Anomaly detection introduction 3 Change point detection
Anomaly detection introduction 2 Outlier detection
Introduction to Anomaly Detection 1 Basics
Change point detection with Kalman filter
I implemented ChangeFinder (change point detection)
Anomaly detection introduction and method summary
Basic flow of anomaly detection
A light introduction to object detection