[PYTHON] Introducing the potential of Plotly scatter plots with practical examples

Hello! This article is Kaggle's Advent Calendar, A scatter plot of Python's data visualization library "Plotly" I thought it would go well with competitions such as Kaggle, so I would like to introduce it.

First of all, I would like you to look at a concrete example. ** Specific example 1 ** Plot the distributed representation of words in 3D compression tsne2-compressor.gif

** Specific example 2 ** Plot rental data of 23 wards of Tokyo on map data mynabi-compressor (1).gif

What do you think! The strength is that these plots with a large amount of information can be drawn in just a few lines. There are many Python visualization libraries, I think Plotly has the following unique features.

--Can be written in a few lines --Can be used interactively ――Since you can zoom, you can check the details. --You can check the information of 5 variables at once by size, X, Y, Z, color. --You can check the information of the element by pointing the mouse. --Can be shared

Let's take a closer look at the plot, including the notation.

environment

Python 3.7.4 plotly 4.1.0

Specific example 1

Data set used

We have prepared a dimensionally compressed version of the distributed representation of words. This time, the corpus uses text8, the learning uses gensim's word2vec class, and the dimension compression uses t-sne. Store the prepared distributed representation and words in a pandas DataFrame. Screen Shot 2019-12-03 at 16.29.19.png

(For text8, I referred to https://hironsan.hatenablog.com/entry/japanese-text8-corpus.)

code

import plotly.express as px
fig = px.scatter_3d(df, x='x', y='y', z='z',text='word')
fig.show()

Description

You can write in just 3 lines. If you pass the column name to the data frame and x, y, z, text respectively, it will be as above Plots are possible.

Specific example 2

Data set used

I used the data of Mynavi x SIGNATE Student Cup 2019. You can read more about the competition on my blog. http://zerebom.hatenablog.com/entry/2019/11/09/121233?_ga=2.241090371.157833494.1575468424-1743001014.1569899454

This competition was to predict the rent of each property using the rental information of the 23 wards of Tokyo. https://signate.jp/competitions/182

I formatted this data and prepared the following DataFrame. Screen Shot 2019-12-03 at 16.46.49.png

Each column has the following meanings

--id: serial number --y_train: Correct rent data --oof: Rent forecast data --diff: Predicted value-Correct value --abs: Predicted value-Absolute value of correct answer value --loc_lat / loc_lon: Latitude and longitude

code

import plotly.express as px
px.set_mapbox_access_token('YOUR_API_KEY')
fig = px.scatter_mapbox(df, lat="loc_lat", lon="loc_lon",  color="diff", size="abs",text='id',
                  color_continuous_scale=px.colors.sequential.Viridis, size_max=30, zoom=10)
fig.show()

Description

This code was used to find out in which areas there are many rentals with large prediction errors after actually learning the data in the competition.

In order to match map data and latitude / longitude with Plotly, you need to register with a service called MapBox in advance and obtain an API key. You can easily get it from this site. (https://account.mapbox.com/)

In order to display it on the map, it is necessary to specify the argument as follows. This time

--color ... Predicted value error --size ... Absolute value of the error of the predicted value --text (character string displayed overlaid on the element) ... property id

color="diff", size="abs",text='id'

Specify the color map selection, maximum element size, and map zoom as follows. color_continuous_scale=px.colors.sequential.Viridis, size_max=30, zoom=10

Fine settings

It will be changed if you pass the setting in dictionary type in fig.update_layout. There are many examples on plotly's official website, and the code and plot are a set, so If you have a setting that interests you, you may want to look at the official website.

(https://plot.ly/python/text-and-annotations/#text-font-as-an-array--styling-each-text-element)

[Example]: I want to change the font and make the characters larger.

fig.update_layout(
  font={"family":"Open Sans",
       "size":16})

in conclusion

I introduced Plotly because I thought that few people were using it for its potential. Especially, it is compatible with 3D data and map data, so please use it!

reference

Load and use the learned Japanese model of Word2Vec https://qiita.com/omuram/items/6570973c090c6f0cb060

Make a Japanese version of text8 corpus and learn distributed expressions https://hironsan.hatenablog.com/entry/japanese-text8-corpus

How to paste a Gif animation captured on a Mac into a Qiita article https://qiita.com/ryosukes/items/b5dd0fac1a059caffbf0

Recommended Posts

Introducing the potential of Plotly scatter plots with practical examples
Manage the overlap when drawing scatter plots with a large amount of data (Matplotlib, Pandas, Datashader)
[Talking about the drawing structure of plotly] Dynamic visualization with plotly [python]
Practical edition of automating the testing of Flutter apps with Appium (Python)
Application of graphs with plotly sliders
The story of having a hard time introducing OpenCV with M1 MAC
[Examples of improving Python] Learning Python with Codecademy
Check the existence of the file with python
The third night of the loop with for
The second night of the loop with for
Count the number of characters with echo
Understand the images of various matrix operations used in Keras (Tensorflow) with examples