[Python] If you want to draw a scatter plot of multiple clusters

Purpose of this article

When you draw a scatter plot of multiple clusters, the dots may overlap and be difficult to see. Therefore, I created a plot that can be confirmed one cluster at a time using `` `Plotly```.

plot.gif

background

For example, when there is such data with xy coordinates divided into 5 clusters,

image02.png

For example, `` `seaborn``` can draw the following plot in one line.

sns.scatterplot(x="x", y="y", hue="class", data=df)

image01.png

However, it is a little difficult to see if it is left above, so specify the transparency alpha,

sns.scatterplot(x="x", y="y", hue="class", data=df, alpha=0.5)

image02.png

Although it has improved a little, it is still difficult to see with this data.

So I thought it would be nice if I could plot the clusters one by one ... and tried using `` `Plotly```.

Commentary

First, prepare the library,

import numpy as np
import pandas as pd

import seaborn as sns
import matplotlib.pyplot as plt
import plotly.graph_objects as go
import plotly

Prepare dummy data.

x0 = np.random.normal(2, 0.8, 400)
y0 = np.random.normal(2, 0.8, 400)
x1 = np.random.normal(3, 1.2, 600)
y1 = np.random.normal(6, 0.8, 600)
x2 = np.random.normal(4, 0.4, 200)
y2 = np.random.normal(4, 0.8, 200)
x3 = np.random.normal(1, 0.8, 300)
y3 = np.random.normal(3, 1.2, 300)
x4 = np.random.normal(1, 0.8, 300)
y4 = np.random.normal(5, 0.8, 300)

df = pd.DataFrame()

df["x"] = np.concatenate([x0, x1, x2, x3, x4])
df["y"] = np.concatenate([y0, y1, y2, y3, y4])
df["class"] = ["Cluster 0"]*400 + ["Cluster 1"]*600 + ["Cluster 2"]*200+ ["Cluster 3"]*300+ ["Cluster 4"]*300

Next, the plot part of the main subject, I'll show you all the code first.

def plotly_scatterplot(x, y, hue, data, title=""):
    cluster = df[hue].unique()
    n_cluster = len(cluster)
    colors = plt.rcParams['axes.prop_cycle'].by_key()['color']

    fig = go.Figure()

    button = []
    tf = [True]*n_cluster
    tmp = dict(label="all",
               method="update",
               args=[{"visible": tf}]
               )
    button.append(tmp)

    for i,clu in enumerate(cluster):
        fig.add_trace(
            go.Scatter(
                x = df[df[hue]==clu][x],
                y = df[df[hue]==clu][y],
                mode="markers",
                name=clu,
                marker=dict(color=colors[i])
                )
            )

        tf = [False]*n_cluster
        tf[i] = True
        tmp = dict(label=clu,
                   method="update",
                   args=[{"visible": tf}]
                   )
        button.append(tmp)
    

    fig.update_layout(
        updatemenus=[
            dict(type="buttons",
                 x=1.15,
                 y=1,
                 buttons=button
                 )
            ])

    x_min = df[x].min()
    x_max = df[x].max()
    x_range = x_max - x_min
    y_min = df[y].min()
    y_max = df[y].max()
    y_range = y_max - y_min

    fig.update_xaxes(range=[x_min-x_range/10, x_max+x_range/10])
    fig.update_yaxes(range=[y_min-y_range/10, y_max+x_range/10])
    fig.update_layout(
    title_text=title,
    xaxis_title=x,
    yaxis_title=y,
    showlegend=False,
    )

    fig.show()
    #plotly.offline.plot(fig, filename='graph.html')

Excuse me for being so long ... There are two points.

Point 1

fig.add_trace(
    go.Scatter(
        x = df[df[hue]==clu][x],
        y = df[df[hue]==clu][y],
        mode="markers",
        name=clu,
        marker=dict(color=colors[i])
        )
    )

In this part, we are creating a scatter plot for each cluster in the data frame df. colorscontains a color string that is automatically selected by plt. Therefore, it is specified in order with `` `color = colors [i].

Point 2

tf = [False]*n_cluster
tf[i] = True
tmp = dict(label=clu,
           method="update",
           args=[{"visible": tf}]
           )
button.append(tmp)

to tf[False, True, False, False, False]There is a boolean value like,Select which trace to show / hide.


 This time, with `` `fig.add_trace```, 5 scatter plots are overlapped, and what number is to be displayed. `` `tf = [True, True, True, True, True" ] `` `and all True will display a scatter plot of all data.


 After that, in the next line,

```python
plotly_scatterplot(x="x", y="y", hue="class", data=df, title="Scatter Plot")

You can draw the plot at the beginning.

that's all!

reference

Plotly:Update Button stack overflow:Get default line colour cycle

Recommended Posts

[Python] If you want to draw a scatter plot of multiple clusters
[Python] How to draw a scatter plot with Matplotlib
If you want to assign csv export to a variable in python
[Python] I want to make a 3D scatter plot of the epicenter with Cartopy + Matplotlib!
If you want a singleton in python, think of the module as a singleton
If you want to create a Word Cloud.
If you want to make a TODO application (distributed) now using only Python
If you want to make a discord bot with python, let's use a framework
[Python] If you suddenly want to create an inquiry form
When you want to hit a UNIX command on Python
If you want to become a data scientist, start with Kaggle
Don't write Python if you want to speed it up with Python
I want to know if you install Python on Mac ・ Iroha
Check if you can connect to a TCP port in Python
When you want to sort a multidimensional list by multiple lines
When you want to use multiple versions of the same Python library (virtual environment using venv)
When you want to replace multiple characters in a string without using regular expressions in python3 series
Have Alexa run Python to give you a sense of the future
If you want to include awsebcli with CircleCI, specify the python version
Python Note: When you want to know the attributes of an object
If you want to get multiple statistics with groupby in pandas v1
I want to color a part of an Excel string in Python
If you want to count words in Python, it's convenient to use Counter.
Python: I want to measure the processing time of a function neatly
If you want to switch the execution user in the middle of a Fabric task, settings context manager
Draw a graph of a quadratic function in Python
Two document generation tools that you definitely want to use if you write python
If you want to display values using choices in a template in a Django model
Try to draw a life curve with python
[Python] How to draw multiple graphs with Matplotlib
If you want to make a Windows application (exe) that can be actually used now using only Python
What you want to memorize with the basic "string manipulation" grammar of python
Make a joyplot-like plot of R in python
[Python] How to draw a histogram in Matplotlib
Draw a line / scatter plot on the CSV file (2 columns) with python matplotlib
I want to write to a file with Python
[Linux] When you want to search for a specific character string from multiple files
How to check in Python if one of the elements of a list is in another list
Understand Python yield If you put yield in a function, it will change to a generator
I want to embed a variable in a Python string
I want to easily implement a timeout in python
I want to iterate a Python generator many times
A memo connected to HiveServer2 of EMR with python
[Python] How to draw a line graph with Matplotlib
I want to generate a UUID quickly (memorandum) ~ Python ~
I tried to draw a route map with Python
I want to write in Python! (2) Let's write a test
If you want to use Cython, also include python-dev
I want to randomly sample a file in Python
I want to work with a robot in python.
When you want to play a game via Proxy
If you encounter a "Unicode Decode Error" in Python
I want to install a package of Php Redis
[Python] I want to make a nested list a tuple
Python + selenium to GW a lot of e-mail addresses
When you want to plt.save in a for statement
I want to run a quantum computer with Python
Python Note: The mystery of assigning a variable to a variable
Make a note of what you want to do in the future with Raspberry Pi
While solving the introductory statistics exercise 12.10, check how to draw a scatter plot in pandas.
A Python script that allows you to check the status of the server from your browser