Interlinked Plots#

hvPlot allows you to generate a number of different types of plot quickly from a standard API, returning HoloViews objects as discussed in the previous notebook. Each initial plot will make some aspects of the data clear, and using the automatic interactive pan, zoom, and hover tools you can find additional trends and outliers at different spatial locations and spatial scales within each plot.

Beyond what you can discover from each plot individually, how do you understand how the various plots relate to each other? For instance, imagine you have a data frame with columns u, v, w, z, and have separate plots of u vs. v, u vs. w, and w vs. z. If you see a few outliers or a clump of unusual datapoints in your u vs. v plot, how can you find out the properties of those points in the w vs. z or other plots? Are those unusual u vs. v points typically high w, uniformly distributed along w, or some other pattern?

To help understand multicolumnar and multidimensional datasets like this, scientists will often build complex multi-pane dashboards with custom functionality. HoloViz (and specifically Panel) tools are great for such dashboards, but here we can actually use the fact that hvPlot returns HoloViews objects to get quite sophisticated interlinking (linked brushing) “for free”, without needing to build any dashboard. HoloViews objects store metadata about what dimensions they cover, and we can use this metadata programmatically to let the user see how any data points in any plot relate across different plots.

To see how this works, let us get back to the example we were working on at the end of the last notebook:

import pathlib
import holoviews as hv
import pandas as pd
import hvplot.pandas  # noqa
import colorcet as cc

First let us load the data as before:

%%time
df = pd.read_parquet(pathlib.Path('../data/earthquakes-projected.parq'))
df = df.set_index(df.time)
CPU times: user 3.13 s, sys: 375 ms, total: 3.5 s
Wall time: 1.91 s

And filter to the most severe earthquakes (magnitude > 7):

most_severe = df[df.mag >= 7]

Linked brushing across elements#

In the previous notebook, we saw how plot axes are automatically linked for panning and zooming when using the + operator, provided the dimensions match. When dimensions or an underlying index match across multiple plots, we can use a similar principle to achieve linked brushing, where user selections are also linked across plots.

To illustrate, let us generate two histograms from our most_severe_projected DataFrame:

mag_hist = most_severe.hvplot(
    y='mag', kind='hist', responsive=True, min_height=150)

depth_hist = most_severe.hvplot(
        y='depth', kind='hist', responsive=True, min_height=150)

These two histograms are plotting two different dimensions of our earthquake dataset (magnitude and depth), derived from the same set of earthquake samples. The samples between these two histograms share an index, and the relationships between these data points can be discovered and exploited programmatically even though they are in different elements. To do this, we can create an object for linking selections across elements:

ls = hv.link_selections.instance()

Given some HoloViews objects (elements, layouts, etc.), we can create versions of them linked to this shared linking object by calling ls on them:

ls(depth_hist + mag_hist)

Try using the first Bokeh tool to select areas of either histogram: you’ll then see both the depth and magnitude distributions for the bins you have selected, compared to the overall distribution. By default, selections on both histograms are combined so that the selection is the intersection of the two regions selected (data points matching both the constraints on depth and the constraints on magnitude that you select). For instance, try selecting the deepest earthquakes (around 600), and you can see that those are not specific to one particular magnitude. You can then further select a particular magnitude range, and see how that range is distributed in depth over the selected depth range. Linked selections like this make it feasible to look at specific regions of a multidimensional space and see how the properties of those regions compare to the properties of other regions. You can use the Bokeh reset tool (double arrow) to clear your selection.

Note that these two histograms are derived from the same DataFrame and created in the same call to ls, but neither of those is necessary to achieve the linked behavior! If linking two different DataFrames, the important thing to check is that any columns with the same name actually do have the same meaning, and that any index columns match, so that the plots you are visualizing make sense when linked together.

Linked brushing across element types#

The previous example linked across two histograms as a first example, but nothing prevents you from linked brushing across different element types. Here are our earthquake points, also derived from the same DataFrame, where the only change from earlier is that we are using the reversed warm colormap (described in the previous notebook):

geo = most_severe.hvplot(
    'easting', 'northing', color='mag', kind='points', tiles='EsriUSATopo',
    xaxis=None, yaxis=None, responsive=True, height=350, cmap = cc.CET_L4[::-1], framewise=True)

Once again, we just need to pass our points to the ls object (newly declared here to be independent of the one above) to declare the linkage:

ls2 = hv.link_selections.instance()

(ls2(geo + depth_hist)).cols(1)

Now you can use the box-select tool to select earthquakes on the map and view their corresponding depth distribution, or vice versa. E.g. if you select just the earthquakes in Alaska, you can see that they tend not to be very deep underground (though that may be a sampling issue). Other selections will show other properties, in this case typically with no obvious relationship between geographic location and depth distribution.

Accessing the data selection#

If you pass your DataFrame into the .filter method of your linked selection object, you can apply the active filter that you specified interactively:

ls2.filter(most_severe)
index depth depthError dmin gap horizontalError id latitude locationSource longitude ... net nst place rms status time type updated easting northing
time
2000-01-08 16:47:20.580000+00:00 4560 183.40 NaN NaN NaN NaN usp0009kx3 -16.9250 us -174.2480 ... us NaN Tonga 1.25 reviewed 2000-01-08 16:47:20.580000+00:00 earthquake 2017-11-07T16:16:22.048Z -1.939720e+07 -1.912096e+06
2000-02-25 01:43:58.640000+00:00 768 33.00 NaN NaN NaN NaN usp0009nxg -19.5280 us 173.8180 ... us NaN Vanuatu region 1.20 reviewed 2000-02-25 01:43:58.640000+00:00 earthquake 2017-11-07T16:17:30.218Z 1.934933e+07 -2.217199e+06
2000-03-28 11:00:22.510000+00:00 817 126.50 NaN NaN NaN NaN usp0009qb4 22.3380 us 143.7300 ... us NaN Volcano Islands, Japan region 1.22 reviewed 2000-03-28 11:00:22.510000+00:00 earthquake 2018-10-17T19:37:57.922Z 1.599995e+07 2.552155e+06
2000-04-23 09:27:23.320000+00:00 1421 608.50 NaN NaN NaN NaN usp0009rrc -28.3070 us -62.9900 ... us NaN Santiago Del Estero, Argentina 0.89 reviewed 2000-04-23 09:27:23.320000+00:00 earthquake 2017-11-07T16:14:17.222Z -7.012015e+06 -3.287735e+06
2000-05-12 18:43:18.120000+00:00 3695 225.00 4.6 NaN NaN NaN usp0009suu -23.5480 us -66.4520 ... us NaN Jujuy, Argentina 0.86 reviewed 2000-05-12 18:43:18.120000+00:00 earthquake 2017-11-07T16:21:24.397Z -7.397403e+06 -2.698426e+06
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2018-11-30 17:29:29.330000+00:00 411 46.70 0.1 NaN NaN NaN ak20419010 61.3464 ak -149.9552 ... ak NaN 14km NNW of Anchorage, Alaska 1.04 reviewed 2018-11-30 17:29:29.330000+00:00 earthquake 2019-06-20T17:32:06.274Z -1.669294e+07 8.705799e+06
2018-12-29 03:39:09.740000+00:00 928 60.21 3.2 1.769 21.0 6.1 us2000iyta 5.8983 us 126.9209 ... us NaN 96km ESE of Pondaguitan, Philippines 1.45 reviewed 2018-12-29 03:39:09.740000+00:00 earthquake 2019-03-05T17:46:37.040Z 1.412877e+07 6.577586e+05
2018-12-20 17:01:55.150000+00:00 3895 16.56 2.9 4.126 23.0 7.9 us2000ivfw 55.0999 us 164.6993 ... us NaN 83km W of Nikol'skoye, Russia 0.79 reviewed 2018-12-20 17:01:55.150000+00:00 earthquake 2019-02-23T20:21:43.040Z 1.833424e+07 7.381279e+06
2018-12-11 02:26:29.420000+00:00 7502 133.00 1.9 7.043 20.0 9.8 us2000isc8 -58.5446 us -26.3856 ... us NaN 54km N of Bristol Island, South Sandwich Islands 0.92 reviewed 2018-12-11 02:26:29.420000+00:00 earthquake 2019-02-16T22:29:03.040Z -2.937232e+06 -8.082602e+06
2018-12-05 04:18:08.420000+00:00 11184 10.00 1.5 2.405 18.0 5.1 us1000i2gt -21.9496 us 169.4266 ... us NaN 165km ESE of Tadine, New Caledonia 0.74 reviewed 2018-12-05 04:18:08.420000+00:00 earthquake 2019-02-16T19:52:20.040Z 1.886048e+07 -2.505475e+06

295 rows × 25 columns

Exercise#

Try selecting a small number of earthquakes on the map above and re-running the previous cell. You should see that your DataFrame only includes the earthquakes you have selected. You can use this linked selections feature in your own workflows by selecting a region of your data, then running subsequent analyses only on that subset of the data (or comparing that subset to the whole data set).

Conclusion#

When exploring data it can be convenient to use the .plot API to quickly visualize a particular dataset. By calling .hvplot to generate different plots over the course of a session, it is possible to gradually build up a mental model of how a particular dataset is structured. Linked selections let you see relationships between your data’s dimensions and clusters of datapoints much more directly, so that you can:

  1. Interactively explore high-dimensional data by making selections across different views of the same underlying samples.

  2. Turn this interactive exploration into a Python subselection of your data, allowing you to continue your data analysis on a subset of your data that you interactively selected.

This approach is very general and allows a deeper understanding of high-dimensional data through interactivity. This interactivity is itself built on the very powerful HoloViews ‘streams’ system which you can leverage for yourself to build youw own Custom Interactivity (optional, advanced topic) when necessary.

In the next section we will see how to apply data processing in a pipelined form, allowing us to build interactive visualizations driven by user-defined widgets when we want to have custom control over our data processing and selection.

This web page was generated from a Jupyter notebook and not all interactivity will work on this website. Right click to download and run locally for full Python-backed interactivity.

Right click to download this notebook from GitHub.