[PYTHON] I want to fully understand the basics of Bokeh

In order to understand bokeh's interactive plot as carefully as possible, we try to decipher it using [Affordable sample] 0 as an example. I wrote the specifications by expectation from the source code, but since I also checked the documentation, there should be no doubt about it. The entire source code is [here] 1

Such a guy ↓

Data generation

# create three normal population samples with different parameters
x1 = np.random.normal(loc=5.0, size=400) * 100
y1 = np.random.normal(loc=10.0, size=400) * 10

x2 = np.random.normal(loc=5.0, size=800) * 50
y2 = np.random.normal(loc=5.0, size=800) * 10

x3 = np.random.normal(loc=55.0, size=200) * 10
y3 = np.random.normal(loc=4.0, size=200) * 10

x = np.concatenate((x1, x2, x3))
y = np.concatenate((y1, y2, y3))

The explanation is omitted because it is not related to bokeh.

Point cloud plot

TOOLS="pan,wheel_zoom,box_select,lasso_select,reset"

# create the scatter plot
p = figure(tools=TOOLS, plot_width=600, plot_height=600, min_border=10, min_border_left=50,
           toolbar_location="above", x_axis_location=None, y_axis_location=None,
           title="Linked Histograms")
p.background_fill_color = "#fafafa"

↑ Give the figure object the name of the tool you want to display and set the background color. Up to this point, the flow is the same as a normal plot. figure is a function of bokeh.models.figure and its return value is bokeh.models.Figure. I will put various settings into this.

p.select(BoxSelectTool).select_every_mousemove = False
p.select(LassoSelectTool).select_every_mousemove = False

↑ Be careful from here. First, p.select () is the method of Figure, and the main is the method of inheritance source bokeh.models.Model. Given a class object for the selector, returns the appropriate type of selector assigned to the figure. For example, in the first line, an instance of BoxSelectTool class is obtained, and by setting select_every_mousemove to False, update does not occur until mouse selection is completed.

Reference →

select(selector) Query this object and all of its references for objects that match the given selector.

r = p.scatter(x, y, size=3, color="#3A5785", alpha=0.6)

Here is a plot of point clouds! The return value r is used last. The return value when plotting is generally returned by an instance of the GlyphRenderer class, which can be used later to tweak the plot.

histogram

# create the horizontal histogram
hhist, hedges = np.histogram(x, bins=20)
hzeros = np.zeros(len(hedges)-1)
hmax = max(hhist)*1.1

↑ The return value of numpy.histrogram () is (height list of each histogram, boundary value list of histogram). bokeh doesn't matter.

LINE_ARGS = dict(color="#3A5785", line_color=None)

ph = figure(toolbar_location=None, plot_width=p.plot_width, plot_height=200, x_range=p.x_range,
            y_range=(-hmax, hmax), min_border=10, min_border_left=50, y_axis_location="right")
ph.xgrid.grid_line_color = None
ph.yaxis.major_label_orientation = np.pi/4
ph.background_fill_color = "#fafafa"

ph.quad(bottom=0, left=hedges[:-1], right=hedges[1:], top=hhist, color="white", line_color="#3A5785")

↑ Horizontal histogram. There are many setting items and it is complicated, but the basic flow remains the same. I get a figure with figure () and draw a quadrangle with its quad () method. (It seems to be a method that draws a rectangle parallel to the coordinates, not a method dedicated to the histogram). The coordinates of the four sides are specified by bottom ~ top. Each rectangle seems to be an instance of the bokeh.models.glyphs.Quad class.

The other arguments that seem to be useful are as follows.

toolbar_location=None #Hide toolbar
plot_width=p.plot_width #Sharing the width of the plot
x_range=p.x_range #x Coordinate range sharing
min_border_left=50 #Minimum margin on the left side of the plot
ph.yaxis.major_label_orientation = np.pi/4 #Rotation of coordinate label

Histogram at selection

The histogram that is displayed when you select a point cloud,

hh1 = ph.quad(bottom=0, left=hedges[:-1], right=hedges[1:], top=hzeros, alpha=0.5, **LINE_ARGS)
hh2 = ph.quad(bottom=0, left=hedges[:-1], right=hedges[1:], top=hzeros, alpha=0.1, **LINE_ARGS)

I drew it first. At this point, top = hzeros, so the height is set to 0 and it is not visible. It seems that the height of these return values hh1 and hh2 is updated when selected (*).

Vertical histogram

# create the vertical histogram
vhist, vedges = np.histogram(y, bins=20)
vzeros = np.zeros(len(vedges)-1)
vmax = max(vhist)*1.1

pv = figure(toolbar_location=None, plot_width=200, plot_height=p.plot_height, x_range=(-vmax, vmax),
            y_range=p.y_range, min_border=10, y_axis_location="right")
pv.ygrid.grid_line_color = None
pv.xaxis.major_label_orientation = np.pi/4
pv.background_fill_color = "#fafafa"

pv.quad(left=0, bottom=vedges[:-1], top=vedges[1:], right=vhist, color="white", line_color="#3A5785")
vh1 = pv.quad(left=0, bottom=vedges[:-1], top=vedges[1:], right=vzeros, alpha=0.5, **LINE_ARGS)
vh2 = pv.quad(left=0, bottom=vedges[:-1], top=vedges[1:], right=vzeros, alpha=0.1, **LINE_ARGS)

It is set in the same way as the horizontal direction.

Finish

Assemble the old ones.

layout = gridplot([[p, pv], [ph, None]], merge_tools=False)

↑ First, figures are arranged two-dimensionally with the bokeh.layouts.girdplot () function. By the way, if you just want to arrange vertically or horizontally instead of 2D, use bokeh.layouts.column or row.

curdoc().add_root(layout)
curdoc().title = "Selection Histogram"

↑ curdoc is an abbreviation of current document, and the default Document (class that summarizes the output of bokeh) is acquired and linked with the grid by the add_root () method. This is the point, but it seems to be a specification that "when some change is made to the add_root grid, the callback registered in Document with" on_change "is called". From the reference →

add_root(model, setter=None)

Add a model as a root of this Document. Any changes to this model (including to other models referred to by it) will trigger on_change callbacks registered on this document.

So if you look at the last line first,

r.data_source.selected.on_change('indices', update)

It is true that the callback is registered in on_change.

R here is the return value of scatter plot () (an instance of the GlyphRenderer class) r.data_source corresponds to the plotted dataset, and its selected corresponds to the selected part of the data.

on_chage () is a method of bokeh.model.Model that registers a callback for that object.

def on_change(self, attr, *callbacks):
        ''' Add a callback on this object to trigger when ``attr`` changes.

        Args:
            attr (str) : an attribute name on this object
            *callbacks (callable) : callback functions to register

The first argument attr is difficult to understand, but here we describe "what changes to call the callback". This time, since it is a selection range (= data index = selection.indices), "indices" is specified.

If you want to tie a callback to a specific event (button press, slider movement, etc.) instead of change

on_event(event, callback)

use.

(Aside from the following) Here, Python functions are registered, but if you want to register functions written in javascript (instance of bokeh.models.CustomJS)

m.js_on_change(attr, callback)

Should be used. This area is described in [Chapter 6 of the official tutorial] 1. It seems that you need to use this if you want a single html output. Runtime warning →

WARNING:bokeh.embed.util:
You are generating standalone HTML/JS output, but trying to use real Python
callbacks (i.e. with on_change or on_event). This combination cannot work.

Only JavaScript callbacks may be used with standalone output. For more
information on JavaScript callbacks with Bokeh, see:

    https://docs.bokeh.org/en/latest/docs/user_guide/interaction/callbacks.html

Alternatively, to use real Python callbacks, a Bokeh server application may
be used. For more information on building and running Bokeh applications, see:

    https://docs.bokeh.org/en/latest/docs/user_guide/server.html

(Digression so far)

Finally, about the contents of the callback function update ()

def update(attr, old, new):
    inds = new
    if len(inds) == 0 or len(inds) == len(x):
        hhist1, hhist2 = hzeros, hzeros
        vhist1, vhist2 = vzeros, vzeros
    else:
        neg_inds = np.ones_like(x, dtype=np.bool)
        neg_inds[inds] = False
        hhist1, _ = np.histogram(x[inds], bins=hedges)
        vhist1, _ = np.histogram(y[inds], bins=vedges)
        hhist2, _ = np.histogram(x[neg_inds], bins=hedges)
        vhist2, _ = np.histogram(y[neg_inds], bins=vedges)

    hh1.data_source.data["top"]   =  hhist1
    hh2.data_source.data["top"]   = -hhist2
    vh1.data_source.data["right"] =  vhist1
    vh2.data_source.data["right"] = -vhist2

It seems that the attr specified by on_change and the values old and new before and after the change of the attribute are given as arguments. Here, new is the index of the newly selected point cloud. As predicted by (*), the top value of the horizontal histogram is updated according to the selected index. The reference relationship is a little far, but

Original dataset ↑ scatter plot (= share data source with horizontal histogram) ↑ selection ↑ on_change callback

It can be seen that the index given to the on_change argument is the same as the index of the first data set because it is connected like this.

References

[Official sample] 0 [Source code] 1