[PYTHON] I want to fully understand the basics of Bokeh

In order to understand bokeh's interactive plot as carefully as possible, we try to decipher it using [Affordable sample] 0 as an example. I wrote the specifications by expectation from the source code, but since I also checked the documentation, there should be no doubt about it. The entire source code is [here] 1

Such a guy ↓ bokeh.png

Data generation

# create three normal population samples with different parameters
x1 = np.random.normal(loc=5.0, size=400) * 100
y1 = np.random.normal(loc=10.0, size=400) * 10

x2 = np.random.normal(loc=5.0, size=800) * 50
y2 = np.random.normal(loc=5.0, size=800) * 10

x3 = np.random.normal(loc=55.0, size=200) * 10
y3 = np.random.normal(loc=4.0, size=200) * 10

x = np.concatenate((x1, x2, x3))
y = np.concatenate((y1, y2, y3))

The explanation is omitted because it is not related to bokeh.

Point cloud plot

TOOLS="pan,wheel_zoom,box_select,lasso_select,reset"

# create the scatter plot
p = figure(tools=TOOLS, plot_width=600, plot_height=600, min_border=10, min_border_left=50,
           toolbar_location="above", x_axis_location=None, y_axis_location=None,
           title="Linked Histograms")
p.background_fill_color = "#fafafa"

↑ Give the figure object the name of the tool you want to display and set the background color. Up to this point, the flow is the same as a normal plot. figure is a function of bokeh.models.figure and its return value is bokeh.models.Figure. I will put various settings into this.

p.select(BoxSelectTool).select_every_mousemove = False
p.select(LassoSelectTool).select_every_mousemove = False

↑ Be careful from here. First, p.select () is the method of Figure, and the main is the method of inheritance source bokeh.models.Model. Given a class object for the selector, returns the appropriate type of selector assigned to the figure. For example, in the first line, an instance of BoxSelectTool class is obtained, and by setting select_every_mousemove to False, update does not occur until mouse selection is completed.

Reference →

select(selector) Query this object and all of its references for objects that match the given selector.

r = p.scatter(x, y, size=3, color="#3A5785", alpha=0.6)

Here is a plot of point clouds! The return value r is used last. The return value when plotting is generally returned by an instance of the GlyphRenderer class, which can be used later to tweak the plot.

histogram

# create the horizontal histogram
hhist, hedges = np.histogram(x, bins=20)
hzeros = np.zeros(len(hedges)-1)
hmax = max(hhist)*1.1

↑ The return value of numpy.histrogram () is (height list of each histogram, boundary value list of histogram). bokeh doesn't matter.

LINE_ARGS = dict(color="#3A5785", line_color=None)

ph = figure(toolbar_location=None, plot_width=p.plot_width, plot_height=200, x_range=p.x_range,
            y_range=(-hmax, hmax), min_border=10, min_border_left=50, y_axis_location="right")
ph.xgrid.grid_line_color = None
ph.yaxis.major_label_orientation = np.pi/4
ph.background_fill_color = "#fafafa"

ph.quad(bottom=0, left=hedges[:-1], right=hedges[1:], top=hhist, color="white", line_color="#3A5785")

↑ Horizontal histogram. There are many setting items and it is complicated, but the basic flow remains the same. I get a figure with figure () and draw a quadrangle with its quad () method. (It seems to be a method that draws a rectangle parallel to the coordinates, not a method dedicated to the histogram). The coordinates of the four sides are specified by bottom ~ top. Each rectangle seems to be an instance of the bokeh.models.glyphs.Quad class.

The other arguments that seem to be useful are as follows.

toolbar_location=None #Hide toolbar
plot_width=p.plot_width #Sharing the width of the plot
x_range=p.x_range #x Coordinate range sharing
min_border_left=50 #Minimum margin on the left side of the plot
ph.yaxis.major_label_orientation = np.pi/4 #Rotation of coordinate label

Histogram at selection

The histogram that is displayed when you select a point cloud,

hh1 = ph.quad(bottom=0, left=hedges[:-1], right=hedges[1:], top=hzeros, alpha=0.5, **LINE_ARGS)
hh2 = ph.quad(bottom=0, left=hedges[:-1], right=hedges[1:], top=hzeros, alpha=0.1, **LINE_ARGS)

I drew it first. At this point, top = hzeros, so the height is set to 0 and it is not visible. It seems that the height of these return values hh1 and hh2 is updated when selected (*).

Vertical histogram

# create the vertical histogram
vhist, vedges = np.histogram(y, bins=20)
vzeros = np.zeros(len(vedges)-1)
vmax = max(vhist)*1.1

pv = figure(toolbar_location=None, plot_width=200, plot_height=p.plot_height, x_range=(-vmax, vmax),
            y_range=p.y_range, min_border=10, y_axis_location="right")
pv.ygrid.grid_line_color = None
pv.xaxis.major_label_orientation = np.pi/4
pv.background_fill_color = "#fafafa"

pv.quad(left=0, bottom=vedges[:-1], top=vedges[1:], right=vhist, color="white", line_color="#3A5785")
vh1 = pv.quad(left=0, bottom=vedges[:-1], top=vedges[1:], right=vzeros, alpha=0.5, **LINE_ARGS)
vh2 = pv.quad(left=0, bottom=vedges[:-1], top=vedges[1:], right=vzeros, alpha=0.1, **LINE_ARGS)

It is set in the same way as the horizontal direction.

Finish

Assemble the old ones.

layout = gridplot([[p, pv], [ph, None]], merge_tools=False)

↑ First, figures are arranged two-dimensionally with the bokeh.layouts.girdplot () function. By the way, if you just want to arrange vertically or horizontally instead of 2D, use bokeh.layouts.column or row.

curdoc().add_root(layout)
curdoc().title = "Selection Histogram"

↑ curdoc is an abbreviation of current document, and the default Document (class that summarizes the output of bokeh) is acquired and linked with the grid by the add_root () method. This is the point, but it seems to be a specification that "when some change is made to the add_root grid, the callback registered in Document with" on_change "is called". From the reference →

add_root(model, setter=None)

Add a model as a root of this Document. Any changes to this model (including to other models referred to by it) will trigger on_change callbacks registered on this document.

So if you look at the last line first,

r.data_source.selected.on_change('indices', update)

It is true that the callback is registered in on_change.

R here is the return value of scatter plot () (an instance of the GlyphRenderer class) r.data_source corresponds to the plotted dataset, and its selected corresponds to the selected part of the data.

on_chage () is a method of bokeh.model.Model that registers a callback for that object.

def on_change(self, attr, *callbacks):
        ''' Add a callback on this object to trigger when ``attr`` changes.

        Args:
            attr (str) : an attribute name on this object
            *callbacks (callable) : callback functions to register

The first argument attr is difficult to understand, but here we describe "what changes to call the callback". This time, since it is a selection range (= data index = selection.indices), "indices" is specified.

If you want to tie a callback to a specific event (button press, slider movement, etc.) instead of change

on_event(event, callback)

use.

(Aside from the following) Here, Python functions are registered, but if you want to register functions written in javascript (instance of bokeh.models.CustomJS)

m.js_on_change(attr, callback)

Should be used. This area is described in [Chapter 6 of the official tutorial] 1. It seems that you need to use this if you want a single html output. Runtime warning →

WARNING:bokeh.embed.util:
You are generating standalone HTML/JS output, but trying to use real Python
callbacks (i.e. with on_change or on_event). This combination cannot work.

Only JavaScript callbacks may be used with standalone output. For more
information on JavaScript callbacks with Bokeh, see:

    https://docs.bokeh.org/en/latest/docs/user_guide/interaction/callbacks.html

Alternatively, to use real Python callbacks, a Bokeh server application may
be used. For more information on building and running Bokeh applications, see:

    https://docs.bokeh.org/en/latest/docs/user_guide/server.html

(Digression so far)

Finally, about the contents of the callback function update ()

def update(attr, old, new):
    inds = new
    if len(inds) == 0 or len(inds) == len(x):
        hhist1, hhist2 = hzeros, hzeros
        vhist1, vhist2 = vzeros, vzeros
    else:
        neg_inds = np.ones_like(x, dtype=np.bool)
        neg_inds[inds] = False
        hhist1, _ = np.histogram(x[inds], bins=hedges)
        vhist1, _ = np.histogram(y[inds], bins=vedges)
        hhist2, _ = np.histogram(x[neg_inds], bins=hedges)
        vhist2, _ = np.histogram(y[neg_inds], bins=vedges)

    hh1.data_source.data["top"]   =  hhist1
    hh2.data_source.data["top"]   = -hhist2
    vh1.data_source.data["right"] =  vhist1
    vh2.data_source.data["right"] = -vhist2

It seems that the attr specified by on_change and the values old and new before and after the change of the attribute are given as arguments. Here, new is the index of the newly selected point cloud. As predicted by (*), the top value of the horizontal histogram is updated according to the selected index. The reference relationship is a little far, but

Original dataset ↑ scatter plot (= share data source with horizontal histogram) ↑ selection ↑ on_change callback

It can be seen that the index given to the on_change argument is the same as the index of the first data set because it is connected like this.

References

[Official sample] 0 [Source code] 1

Recommended Posts

I want to fully understand the basics of Bokeh
I want to customize the appearance of zabbix
Even beginners want to say "I fully understand Python"
I want to grep the execution result of strace
I want to increase the security of ssh connections
I want to understand systemd roughly
I want to use only the normalization process of SudachiPy
I want to judge the authenticity of the elements of numpy array
I want to know the features of Python and pip
Keras I want to get the output of any layer !!
I want to know the legend of the IT technology world
I want to pin Spyder to the taskbar
I want to get the name of the function / method being executed
I want to output to the console coolly
I want to manually assign the training parameters of the [Pytorch] model
[Python3] Understand the basics of Beautiful Soup
I want to handle the rhyme part1
I want to read the html version of "OpenCV-Python Tutorials" OpenCV 3.1 version
I want to handle the rhyme part3
I didn't know the basics of Python
I want to display the progress bar
I want to check the position of my face with OpenCV!
I want to know the population of each country in the world.
I want to handle the rhyme part2
I want to handle the rhyme part5
I want to handle the rhyme part4
[Python3] Understand the basics of file operations
[Note] I want to completely preprocess the data of the Titanic issue-Age version-
I don't want to admit it ... The dynamical representation of Neural Networks
(Python Selenium) I want to check the settings of the download destination of WebDriver
I want to explain the abstract class (ABCmeta) of Python in detail.
I want to sort a list in the order of other lists
I want to express my feelings with the lyrics of Mr. Children
I want to analyze the emotions of people who want to meet and tremble
I want to use the Qore SDK to predict the success of NBA players
I want to leave an arbitrary command in the command history of Shell
I want to stop the automatic deletion of the tmp area with RHEL7
Python: I want to measure the processing time of a function neatly
I want to handle the rhyme part7 (BOW)
I tried to touch the API of ebay
I want to get League of Legends data ③
I want to get League of Legends data ②
I want to get League of Legends data ①
I want to use the activation function Mish
I want to display the progress in Python!
I tried to predict the price of ETF
I tried to vectorize the lyrics of Hinatazaka46!
I want to get the path of the directory where the running file is stored.
I want to visualize the transfer status of the 2020 J League, what should I do?
The story of IPv6 address that I want to keep at a minimum
I want to use Python in the environment of pyenv + pipenv on Windows 10
I want to use PyTorch to generate something like the lyrics of Japari Park
I want to set a life cycle in the task definition of ECS
I want to add silence to the beginning of a wav file for 1 second
I want to see a list of WebDAV files in the Requests module
I want to crop the image along the contour instead of the rectangle [python OpenCV]
I want to store the result of% time, %% time, etc. in an object (variable)
I didn't understand the Resize of TensorFlow so I tried to summarize it visually.
I want to see the file name from DataLoader
I want to understand (engineering) UMAP stronger than t-SNE
I want to detect images of cats from Instagram