[PYTHON] Create a function to visualize / evaluate the clustering result

Visualize and evaluate clustered results

Implemented a function that visualizes the result of clustering with vae etc. and displays the evaluation value.

Relabel the correct label and the cluster number that is the clustering result by majority vote, Draw a pseudo-confusion matrix and calculate accuracy. It also displays the evaluation values by NMI and ARI. I intend to create a function that can evaluate how well clustered it is.

#Import required libraries
import numpy as np
import pandas as pd
import sklearn
#When using Jupyter notebook, display plot result in notebook
import matplotlib.pyplot as plt
%matplotlib inline
df_result_dense = pd.read_csv('result-dense.csv')
Unnamed: 0 labels k-means
0 0 7 2
1 1 2 5
2 2 1 9
3 3 0 3
4 4 4 7
... ... ... ...
9995 9995 2 5
9996 9996 3 0
9997 9997 4 7
9998 9998 5 4
9999 9999 6 6

10000 rows × 3 columns

def relabel(ans, labels):
    df = pd.DataFrame()
    df['ans'] = ans
    df['labels'] = labels
    relabel(df, 'ans', 'labels')

def relabel(df, ans, label):
    #Relabeling closest to ans
    # df[ans]Correct answer, df[labels]Expects to have a cluster label in
    labels = df[label].unique()
    label_dic = {}
    for i in labels:
        counts = df[df[label] == i][ans].value_counts()
        label_dic[i] = counts.index[0]
    return list(pd.Series(df[label]).replace(label_dic))
relabel_k_means = relabel(df_result_dense, 'labels', 'k-means')
df_result_dense['relabel_k_means'] = relabel_k_means
{2: 7, 5: 2, 9: 1, 3: 0, 7: 4, 1: 9, 4: 5, 8: 8, 6: 6, 0: 3}
from sklearn.metrics import accuracy_score
ans = df_result_dense['labels']
labels = df_result_dense['k-means']
relabels = df_result_dense['relabel_k_means']
def eval_cluster(ans, labels, relabels):
    import seaborn as sns
    from sklearn.metrics import confusion_matrix

    sns.heatmap(confusion_matrix(ans, labels), annot=True, fmt='d')

    from sklearn.metrics import normalized_mutual_info_score
    print("nmi: " + str(normalized_mutual_info_score(ans, labels)))
    from sklearn.metrics.cluster import adjusted_rand_score
    print("ari: " + str(adjusted_rand_score(ans, labels)))

    sns.heatmap(confusion_matrix(ans, relabels), annot=True, fmt='d')
    print("acc: " + str(accuracy_score(ans, relabels)))

eval_cluster(ans, labels, relabels)


nmi: 0.8804532777228216
ari: 0.8405114317316403


acc: 0.9309

Recommended Posts

Create a function to visualize / evaluate the clustering result
Create a function to get the contents of the database in Go
Various methods to numerically create the inverse function of a certain function Introduction
How to create a wrapper that preserves the signature of the function to wrap
How to create a function object from a string
Create a command to get the work log
Read the Python-Markdown source: How to create a parser
How to create a submenu with the [Blender] plugin
Various methods to numerically create the inverse function of a certain function Part 1 Polynomial regression
Create a function in Python
How to call a function
Create a function to display images like Jupyter / RStudio [Docker]
Create a Mastodon bot with a function to automatically reply with Python
Probably the easiest way to create a pdf with Python3
I made a function to check the model of DCGAN
[TF] I tried to visualize the learning result using Tensorboard
Let's create a function to hold down Button in Tkinter
I want to create a Dockerfile for the time being.
How to use the zip function
[sh] How to store the command execution result in a variable
[Introduction to Python] How to split a character string with the split function
Steps to create a Django project
Visualize by adding "a bite" to the "boxplot" (boxen / swarm / violin)
Create a shell script to run the python file multiple times
How to create a Conda package
How to make a recursive function
How to create a virtual bridge
[Go] Create a CLI command to change the extension of the image
[Python] Make the function a lambda function
I want to create a system to prevent forgetting to tighten the key 1
How to output the output result of the Linux man command to a file
When you want to save the result of the callback function somewhere
[Python3] Define a decorator to measure the execution time of a function
How to create a Dockerfile (basic)
Create a poster with matplotlib to visualize multiplication tables that remember multiplication
5 Ways to Create a Python Chatbot
Attempt to extend a function in the library (add copy function to pathlib)
How to create a config file
[Python] A simple function to find the center coordinates of a circle
Create a REST API to operate dynamodb with the Django REST Framework
How to divide and process a data frame using the groupby function
[Python] Explains how to use the range function with a concrete example
Added a function to register desired shifts in the Django shift table
[Introduction to Python] How to write a character string with the format function
[Development environment] How to create a data set close to the production DB
I tried to verify the result of A / B test by chi-square test
Python: I want to measure the processing time of a function neatly
What is the fastest way to create a reverse dictionary in python?
I made a function to see the movement of a two-dimensional array (Python)
I tried to create a linebot (implementation)
How to create a clone from Github
Create a Python function decorator with Class
Create a bot to retweet coronavirus information
How to create a git clone folder
I tried to create a linebot (preparation)
Give a title to the ipywidgets tab
Create a graph using the Sympy module
A simple IDAPython script to name a function
Various ways to create a dictionary (memories)
How to create a repository from media
Script to create a Mac dictionary file