Interactive Pipelines#

The plots built up over the first few tutorials were all highly interactive in the web browser, with interactivity provided by Bokeh plotting tools within the plots or in some cases by HoloViews generating a Bokeh widget to select for a groupby over a categorical variable. However, when you are exploring a dataset, you might want to see how any aspect of the data or plot changes if varied interactively. Luckily, hvPlot makes it almost trivially easy to do this, so that you can very easily explore any parameter or setting in your code.

hvPlot registers the .interactive() method on many of the PyData data structures, e.g. a Pandas or GeoPandas or Dask DataFrame, an Xarray DataSet. Calling .interactive() returns an interactive object (e.g. an interactive Pandas DataFrame), that can be used as if it was the original object (e.g. calling regular Pandas methods) and whose output (e.g. a DataFrame view) will be re-computed everytime one of its inputs change. The inputs are widgets (e.g. a drop-down list), that replace values you would usually hard-code and manually update to observe how they affect the output. When such an interactive object is displayed in a notebook, it includes the widgets that you have used together with the regular output.

Panel widgets#

Before using .interactive() we will need a widget library, and here we will be using Panel to generate Bokeh widgets under user control, just as hvPlot uses Panel to generate widgets for a groupby as shown previously. Let’s first get ahold of a Panel widget to see how they work. Here, let’s create a Panel floating-point number slider to specify an earthquake magnitude between zero and nine:

import pathlib

import holoviews as hv
import hvplot.pandas # noqa
import numpy as np
import pandas as pd
import panel as pn

pn.extension(sizing_mode='stretch_width')
mag_slider = pn.widgets.FloatSlider(name='Minimum Magnitude', start=0, end=9, value=6)
mag_slider

The widget is a JavaScript object, but there are bidirectional connections between JS and Python that let us see and change the value of this slider using its value parameter:

mag_slider.value
6
mag_slider.value = 7

Exercise#

Try moving the slider around and rerunning the mag_slider.value above to access the current slider value. As you can see, you can easily get the value of any widget to use in subsequent cells, but you’d need to re-run any cell that accesses that value for it to get updated.

hvPlot .interactive()#

hvPlot provides an easy way to connect widgets directly into an expression you want to control.

First, let’s read in our data:

%%time
df = pd.read_parquet(pathlib.Path('../data/earthquakes-projected.parq'))
df = df.set_index('time').tz_localize(None)
CPU times: user 3.36 s, sys: 649 ms, total: 4.01 s
Wall time: 2.42 s

Now, let’s do a little filtering that we might want to control with such a widget, such as selecting the highest-magnitude events:

WEB_MERCATOR_LIMITS = (-20037508.342789244, 20037508.342789244)

df2 = df[['mag', 'depth', 'latitude', 'longitude', 'place', 'type']][df['northing'] < WEB_MERCATOR_LIMITS[1]]

df2[df2['mag'] > 5].head()
mag depth latitude longitude place type
time
2000-01-31 07:25:59.740 5.4 33.0 38.114 88.604 southern Xinjiang, China earthquake
2000-01-29 08:13:10.730 5.4 60.7 -8.633 111.137 Java, Indonesia earthquake
2000-01-29 02:53:54.890 5.1 100.0 4.857 126.259 Kepulauan Talaud, Indonesia earthquake
2000-01-28 22:57:51.700 5.6 83.4 -9.691 118.764 Sumbawa region, Indonesia earthquake
2000-01-28 22:42:26.250 5.5 10.0 -1.347 89.083 South Indian Ocean earthquake

What if instead of ‘5’, we want the output above always to reflect the current value of mag_slider? We can do that by using hvPlot’s .interactive() support, passing in a widget almost anywhere we want in a pipeline:

dfi = df2.interactive()

dfi[dfi['mag'] > mag_slider].head()

Here, .interactive is a wrapper around your DataFrame or Xarray object that lets you provide Panel widgets almost anywhere you’d otherwise be using a number. Just as importing hvplot.pandas provides a .hvplot() method or object on your dataframe, it also provides a .interactive method or object that gives you a general-purpose interactive Dataframe driven by widgets. .interactive stores a copy of your pipeline (series of method calls or other expressions on your data) and dynamically replays the pipeline whenever that widget changes.

.interactive supports just about any output you might want to get out of such a pipeline, such as text or numbers:

dfi[dfi['mag'] > mag_slider].shape

Or Matplotlib plots:

dfi[dfi['mag'] > mag_slider].plot(y='depth', kind='hist', bins=np.linspace(0, 50, 51))

Each time you drag the widget, hvPlot replays the pipeline and updates the output shown.

Of course, .interactive also supports .hvplot(), here with a new copy of a widget so that it will be independent of the other cells above:

mag_slider2 = pn.widgets.FloatSlider(name='Minimum magnitude', start=0, end=9, value=6)

dfi[dfi['mag'] > mag_slider2].hvplot(y='depth', kind='hist', bins=np.linspace(0, 50, 51))

You can see that the depth distribution varies dramatically as you vary the minimum magnitude, with the lowest magnitude events apparently only detectable at short depths. There also seems to be some artifact at depth 10, which is the largest bin regardless of the filtering for all but the largest magnitudes.

Date widgets#

A .interactive() pipeline can contain any number of widgets, including any from the Panel reference gallery. For instance, let’s make a widget to specify a date range covering the dates found in this data:

date = pn.widgets.DateRangeSlider(name='Date', start=df.index[0], end=df.index[-1])
date

Now we can access the value of this slider:

date.value
(Timestamp('2000-01-31 23:52:00.619000'),
 Timestamp('2018-12-01 00:00:13.284000'))

As this widget is specifying a range, this time the value is returned as a tuple. If you prefer, you can get the components of the tuple directly via the value_start and value_end parameters respectively:

f'Start is at {date.value_start} and the end is at {date.value_end}'
'Start is at 2000-01-31 23:52:00.619000 and the end is at 2018-12-01 00:00:13.284000'

Once again, try specifying different ranges with the widgets and rerunning the cell above.

Now let’s use this widget to expand our expression to filter by date as well as magnitude:

mag = pn.widgets.FloatSlider(name='Minimum magnitude', start=0, end=9, value=6)

filtered = dfi[
    (dfi['mag']   > mag) &
    (dfi.index >= date.param.value_start) &
    (dfi.index <= date.param.value_end)]

filtered.head()

You can now use either the magnitude or the date range (or both) to filter the data, and the output will update. Note that here you want to move the start date of the range slider rather than the end; otherwise, you may not see the table change because the earthquakes are displayed in date order.

Exercise#

To specify the minimum earthquake magnitude, notice that we supplied the whole mag widget but .interactive() used only the value parameter of this widget by default. To be explicit, you may use mag.param.value instead if you wish. Try it!

Exercise#

For readability, seven columns were chosen before displaying the DataFrame. Have a look at df.columns and pick a different set of columns for display.

Functions as inputs#

Quite often the data structure you want to explore in a pipeline, may itself be the outcome of another pipeline. It may for instance be a Pandas Dataframe created by extracting and transforming the output of a database or an API call, or it could be the dynamic output of some simulation or pre-processing. With hvplot.bind you can start with an arbitrary custom function that returns the data structure you want to explore and then bind that function’s argument to widgets. Then when those widgets change, the function will get called to get the updated output.

To keep this example self-contained we’ll illustrate this process using a simple function that filters the earthquakes dataset by event type and returns a DataFrame. Of course, this function could include any computation that returns a DataFrame, including selecting data files on disk or making a query to a database.

def input_function(event_type):
    df2 = df[['mag', 'depth', 'latitude', 'longitude', 'place', 'type']]
    return df2[df2['type'] == event_type]

We can then create a Panel Select widget with a few options and bind it to the input_function. Calling .interactive() on the bound object is what allows it to be used in an interactive pipeline, as we previously did with dfi.

event_types = pn.widgets.Select(options=['earthquake', 'quarry blast', 'explosion', 'ice quake'])

inputi = hvplot.bind(input_function, event_types).interactive()
inputi[inputi['mag'] > mag].head(2)

.interactive() and HoloViews#

.interactive() lets you work naturally with the compositional HoloViews plots provided by .hvplot(). Here, let’s combine such plots using the HoloViews + operator:

mag_hist   = filtered.hvplot(y='mag',   kind='hist', width=300)
depth_hist = filtered.hvplot(y='depth', kind='hist', width=300)

mag_hist + depth_hist

These are the same two histograms we saw earlier, but now we can filter them on data dimensions like time that aren’t even explicitly shown in the plot, using the Panel widgets.

Filtering earthquakes on a map#

To display the earthquakes on a map, we will first create a subset of the data to make it quick to update without needing Datashader.:

subset_df = df[
            (df.northing <  WEB_MERCATOR_LIMITS[1]) &
            (df.mag      >  4) &
            (df.index    >= pd.Timestamp('2017-01-01')) &
            (df.index    <= pd.Timestamp('2018-01-01'))]

Now we can make a new interactive DataFrame from this new subselection:

subset_dfi = subset_df.interactive(sizing_mode='stretch_width')

And now we can declare our widgets and use them to filter the interactive DataFrame as before:

date_subrange = pn.widgets.DateRangeSlider(
    name='Date', start=subset_df.index[0], end=subset_df.index[-1])
mag_subrange = pn.widgets.FloatSlider(name='Magnitude', start=3, end=9, value=3)

filtered_subrange = subset_dfi[
    (subset_dfi.mag   > mag_subrange) &
    (subset_dfi.index >= date_subrange.param.value_start) &
    (subset_dfi.index <= date_subrange.param.value_end)]

Now we can plot the earthquakes on an ESRI tilesource, including the filtering widgets as follows:

geo = filtered_subrange.hvplot(
    'easting', 'northing', color='mag', kind='points',
    xaxis=None, yaxis=None, responsive=True, min_height=500, tiles='ESRI')

geo

Terminating methods for .interactive#

The examples above all illustrate cases where you can display the output of .interactive() and not worry about its type, which is no longer a DataFrame or a HoloViews object, but an Interactive object:

type(geo)
hvplot.interactive.Interactive

What if you need to work with some part of the interactive pipeline, e.g. to feed it to some function or object that does not understand Interactive objects? In such a case, you can use what is called a terminating method on your Interactive object to get at the underlying object for you to use.

For instance, let’s create magnitude and depth histograms on this subset of the data as in an earlier notebook and see if we can enable linked selections on them:

mag_subhist   = filtered_subrange.hvplot(y='mag',   kind='hist', responsive=True, min_height=200)
depth_subhist = filtered_subrange.hvplot(y='depth', kind='hist', responsive=True, min_height=200)

combined = mag_subhist + depth_subhist
combined

Note that this looks like a HoloViews layout with some widgets, but this object is not a HoloViews object. Instead it is still an Interactive object:

type(combined)
hvplot.interactive.Interactive

link_selections does not currently understand Interactive objects, and so it will raise an exception when given one. If we need a HoloViews Layout, e.g. for calling link_selections, we can build a layout from the constituent objects using the .holoviews() terminating method on Interactive:

layout = mag_subhist.holoviews() + depth_subhist.holoviews()
layout

This is now a HoloViews object, so we can use it with link_selections:

print(type(layout))

ls = hv.link_selections.instance()
ls(mag_subhist.holoviews()) + ls(depth_subhist.holoviews())
<class 'holoviews.core.layout.Layout'>

You can use the box selection tool to see how selections compare between these plots. However, you will note that the widgets are no longer displayed. To address this, we can display the widgets separately using a different terminating method, namely .widgets():

filtered_subrange.widgets()

For reference, the terminating methods for an Interactive object are:

  • .holoviews(): Give me a HoloViews object

  • .panel(): Give me a Panel ParamFunction

  • .widgets(): Give me a layout of widgets associated with this interactive object

  • .layout(): Give me the layout of the widgets and display pn.Column(obj.widgets(), obj.panel()) where pn.Column will be described in the Dashboards notebook.

Conclusion#

Using the techniques above, you can build up a collection of plots and other outputs with Panel widgets to control individual bits of computation and display.

What if you want to collect these pieces and put them together into a standalone app or dashboard? If so, then the next tutorial will show you how to do so!

This web page was generated from a Jupyter notebook and not all interactivity will work on this website. Right click to download and run locally for full Python-backed interactivity.

Right click to download this notebook from GitHub.