[PYTHON] I implemented Human In The Loop ― Part ① Dashboard ―

Introduction

Human in the loop (HITL) is the intervention of human operations in the judgment and control of AI, and is considered as one of the means of social implementation of AI where quality control is difficult. [Reference 1] In this article, I would like to implement a machine learning model, a dashboard for monitoring, and a verification tool with a simple web application so that I can grasp the image of HITL. (Scheduled 3 times in total) In HITL, a dashboard that allows you to check actual data and AI behavior at the same time is effective for monitoring, so this time in Part 1, we will set goals, introduce usage data, and implement a dashboard for monitoring.

HITL Implementation Table of Contents
Part ① Dashboard ← This time Part② Verification tool Part ③ HITL (① and ② + model re-learning mechanism)

■ Implementation result ■ Environment

Python 3.7.7 dash 1.16.1 dash-bootstrap-components 0.10.7 dash-core-components 1.12.1 dash-html-components 1.1.1 dash-renderer 1.8.1 dash-table 4.10.1 plotly 4.10.0 Flask 1.1.2 lightgbm 3.0.0

setting of the goal

Various constructions of HITL are expected depending on the task, but in this series, we will adopt the one based on anomaly detection.

■ Implement the following in the WEB application

--Monitoring --Dashboard (Chart) --Machine learning model --Known anomaly detection with supervised model --Detection model --Unknown detection with no teacher model (unknown without verification of normal or abnormal) --Expert verification --Implemented with awareness of annotation tools --Check whether the verifier is normal or abnormal for the unknown detection part --Scenario --Detection model is unknown (machine learning model is judged to be abnormal) --The verifier verifies the unknown detection part and determines that it is normal. --Receive the judgment and relearn the relevant part so that the model judges it as normal.

Usage data

The implementation used data from Kaggle's Credit Card Fraud Detection. [Reference 2]

This data is unbalanced data with the objective variable of credit card fraud, and is often used to try anomaly detection models.

The original data includes PCA V1 to V28 data, This time, the purpose is to demonstrate how to grasp the image, so the input features are set to 2 variables (V4, V14) * for easy understanding when visualized.

We have confirmed in advance that some unauthorized use can be detected only with V4 and V14.

In the demo, we will monitor, verify, and relearn the above data plus unknown normal data artificially. The unknown normal data created artificially must satisfy the following in the scenario.

① The supervised model falsely detects (erroneously determines that it is abnormal) (2) The detection model can be detected (that is, the artificial data is clearly different from the original learning data) ③ Even if re-learning, the supervised model can keep the existing abnormality judgment.

Based on the above, we created the demo data. (Less than) (1 to 8s: known (abnormal at 3s), 9 to 11s: unknown, repeated twice)

Technology used

Describes the framework and model used for implementation.

■ Visualization part

flask --Lightweight web application framework --Transfer between WEB application and server by routing function
Dash --Web application framework specializing in visualization [Reference 5] --An extension of flask with a similar coding feel
plotly --Interactive visualization framework [Reference 6] --Along with Bokeh, it's as popular as Python's similar framework.

■ Machine learning model, detection model

--Machine learning model - LightGBM --Display the score of binary classification (normal: 0, invalid: 1) --At first, use what you learned in advance --Detection model - KNN --The neighborhood distance is set as the degree of abnormality, and the threshold is cut and binarized (normal: 0, unknown: -1). --At first, use what you learned in advance

Implementation

We will implement it separately as follows.

1. Dashboard ← this time
1. Verification tool
1. HITL ((1) and (2) + model re-learning mechanism)

1. 1. Dashboard

■ Implementation details

--Goal: Simple dashboard (Chart + α) --Input: Table data (csv file) --Output: Verification data with high (observed) anomaly score --Abnormality judgment: Display the model's abnormality score side by side with the actual data --Other: Real-time update -+ Α: Here, in addition to charts, pie charts, distplots, bar graphs, and tables are implemented.

■ Implementation image

■ Implementation result

Summary

This time, I implemented a dashboard to be used for monitoring in Human In The Loop. The dashboard visualizes the actual data information and the AI score in real time, making it easier to understand where the AI is focusing on the actual data. (In this data, AI seems to judge that it is invalid when V4 is positive and V14 is negative) By using Dash and plotly, I found it relatively easy to code the HTML and CSS parts. Above all, I would like you to experience the feeling that you can easily implement such a web application that runs in real time.

If you have any improvements or questions, I would appreciate it if you could comment.

reference

Human-in-the-loop AI that creates a better business, society, and future https://note.com/masayamori/n/n2764e3cecc05
Kaggle - Credit Card Fraud Detection https://www.kaggle.com/mlg-ulb/creditcardfraud
Create a web application that can do machine learning with Dash [Step1] https://wimper-1996.hatenablog.com/entry/2019/10/28/dash_machine_learning1

I used it as a reference when processing table data in Dash.

Code published http://github.com/utmoto
Dash https://dash.plotly.com/
plotly https://plotly.com/