[PYTHON] I tried to implement anomaly detection using a hidden Markov model

Introduction

This time, I implemented an anomaly detection method using a hidden Markov model, which is a highly versatile time series analysis model.

What is anomaly detection using a hidden Markov model?

I will omit the details of the theoretical part this time, but I will introduce the series of "states" behind the data, predict the probability of occurrence of the data to build the model, and perform anomaly detection. The features are briefly described below.

The data set used this time is as follows.

The python code is below.

#Import required libraries
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

#Load the required function
from hmmlearn import hmm

df = np.loadtxt("qtdbsel102.txt", delimiter="\t")
#Use the data in the third column
#train data,Create test data
train_df = df[0:3000, 2]
test_df = df[3000:6000, 2]

#Visualization of training data
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
ax.grid(False)

ax.plot(train_df)

ax.set_title('train_df')
ax.set_ylabel('value')
ax.set_xlabel('time')

#Visualization of evaluation data
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
ax.grid(False)

ax.plot(test_df)

ax.set_title('test_df')
ax.set_ylabel('value')
ax.set_xlabel('time')

train_df.png test_df.png

Now that we know the characteristics of the training data and the evaluation data, we will estimate the distribution.

num_states = 15

X = train_df.reshape(-1, 1)
lengths = [len(train_df)]

np.random.seed(seed=7)
model = hmm.GaussianHMM(n_components=num_states, covariance_type='full')
model.fit(X, lengths)

This time, the number of state types (num_states) is 15. When using the hmmlearn library, use hmmlearn.GaussianHMM.fit ().

Next, the degree of anomaly is calculated using the parameters calculated in the distribution estimation.

# model.scores()In the function, the series x'Log-likelihood p(log(x'))Is calculated.
logprob = np.array([model.score(train_df[0:i+1].reshape(-1, 1)) for i in range(len(train_df))])
train_abnormality = -np.append(logprob[0], np.diff(logprob))

#Threshold setting
ratio = 0.005 #Percentage of judgments as abnormal
threshold = np.sort(train_abnormality)[int((1-ratio)*len(train_abnormality))]
print(threshold)

Anomaly detection is performed using the model constructed last.

#Evaluation data anomaly detection
logprob = np.array([model.score(test_df[0:i+1].reshape(-1, 1)) for i in range(len(test_df))])
test_abnormality = -np.append(logprob[0], np.diff(logprob))

#Visualize the degree of abnormality in evaluation data
#The control limit is indicated by a broken line
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.grid(False)

ax.axhline(threshold, ls="--", color="red")
ax.plot(test_abnormality, color="gray")

ax.set_title('test_df_abnormality')
ax.set_xlabel('time')
ax.set_ylabel('test_df_abnormality')

test_df_abnormality.png

Finally, I would like to visualize the evaluation data and the degree of abnormality together.

#Visualize the degree of abnormality in evaluation data
#The control limit is indicated by a broken line
fig = plt.figure()
ax1 = fig.add_subplot(1,1,1)
ax1.grid(False)

#Associate ax1 and ax2
ax2 = ax1.twinx()

ax2.axhline(threshold, ls="--", color="red")
ax1.plot(test_df)
ax2.plot(test_abnormality, color="gray")

ax1.set_title('test_df_and_abnormality')
ax1.set_ylabel('value')
ax1.set_xlabel('time')
ax2.set_ylabel('abnormality')

finalplot.png

It was relatively easy to implement. However, since the calculation cost is a little high, it seems necessary to consider the calculation time when using it in practice. In addition, the more states there are, the more complicated the structure becomes, but since there is no clear basis for decision, it seems necessary to determine an appropriate number of states according to the data at the site.

at the end

Thank you for reading to the end. This time, I implemented anomaly detection using a hidden Markov model.

If you have a request for correction, we would appreciate it if you could contact us.

Recommended Posts

I tried to implement anomaly detection using a hidden Markov model
I tried to implement a basic Recurrent Neural Network model
I tried to implement anomaly detection by sparse structure learning
I tried to implement TOPIC MODEL in Python
I tried to make a ○ ✕ game using TensorFlow
[Python] Deep Learning: I tried to implement deep learning (DBN, SDA) without using a library.
I tried to implement a pseudo pachislot in Python
I tried to implement a recommendation system (content-based filtering)
I tried to implement PCANet
I tried hosting a Pytorch sample model using TorchServe
I tried to implement various methods for machine learning (prediction model) using scikit-learn.
I tried to implement StarGAN (1)
PyTorch Learning Note 2 (I tried using a pre-trained model)
I tried to draw a configuration diagram using Diagrams
I tried to implement a volume moving average with Quantx
I tried to implement a one-dimensional cellular automaton in Python
I tried to automatically create a report with Markov chain
[Markov chain] I tried to read a quote into Python.
I tried hosting a TensorFlow deep learning model using TensorFlow Serving
I tried to automate [a certain task] using Raspberry Pi
I tried to make a stopwatch using tkinter in python
I tried to divide with a deep learning language model
I tried to make a simple text editor using PyQt
I tried to implement SSD with PyTorch now (model edition)
I tried to implement adversarial validation
I tried to implement hierarchical clustering
I tried to implement Realness GAN
I tried to make a motion detection surveillance camera with OpenCV using a WEB camera with Raspberry Pi
I tried using PI Fu to generate a 3D model of a person from one image
I tried to get a database of horse racing using Pandas
I tried to make a regular expression of "amount" using Python
[Python] I tried to implement stable sorting, so make a note
I tried to make a regular expression of "time" using Python
I tried to make a regular expression of "date" using Python
I tried to implement a misunderstood prisoner's dilemma game in Python
I tried to get a list of AMI Names using Boto3
I tried to make a todo application using bottle with python
I tried to create a linebot (implementation)
I tried to implement PLSA in Python
I tried using Azure Speech to Text.
I tried to implement Autoencoder with TensorFlow
I tried to implement permutation in Python
I tried to create a linebot (preparation)
I tried playing a ○ ✕ game using TensorFlow
I tried to implement PLSA in Python 2
I tried drawing a line using turtle
I tried to classify text using TensorFlow
I tried to implement ADALINE in Python
I tried to implement PPO in Python
I tried to implement CVAE with PyTorch
I tried to make a Web API
I tried using pipenv, so a memo
I tried 3D detection of a car
I tried to predict Covid-19 using Darts
I tried to implement a blockchain that actually works with about 170 lines
I learned scraping using selenium to make a horse racing prediction model.
I tried to perform a cluster analysis of customers using purchasing data
I tried to create a sample to access Salesforce using Python and Bottle
I tried to implement a card game of playing cards in Python
I tried to make PyTorch model API in Azure environment using TorchServe
I tried to build a super-resolution method / ESPCN