Interlinked Plots#

hvPlot allows you to generate many different types of plot quickly from a standard API, returning HoloViews objects as discussed in the previous notebook. Each initial plot will make some aspects of the data clear, and using the automatic interactive pan, zoom, and hover tools you can find additional trends and outliers at different spatial locations and spatial scales within each plot.

Beyond what you can discover from each plot individually, how do you understand how the various plots relate to each other to reveal the full dataset? For instance, imagine you have a data frame with columns u, v, w, z, and have separate plots of u vs. v, u vs. w, and w vs. z. If you see a few outliers or a clump of unusual datapoints in your u vs. v plot, how can you find out the properties of those points in the w vs. z or other plots? Are those unusual u vs. v points typically high w, uniformly distributed along w, or some other pattern?

To help understand multicolumnar and multidimensional datasets like this, scientists will often build complex multi-pane dashboards with custom functionality. HoloViz (and specifically Panel) tools are great for such dashboards, but here we can actually use the fact that hvPlot returns HoloViews objects to get quite sophisticated interlinking (linked brushing) “for free”, without needing to build any dashboard. HoloViews objects store metadata about what dimensions they cover, and we can use this metadata programmatically to let the user see how any data points in any plot relate across different plots.

To see how this works, let us get back to the example we were working on at the end of the last notebook:

import pathlib
import holoviews as hv
import pandas as pd
import hvplot.pandas  # noqa
import colorcet as cc

First let us load the data as before:

%%time
df = pd.read_parquet(pathlib.Path('../data/earthquakes-projected.parq'))

CPU times: user 2.05 s, sys: 352 ms, total: 2.4 s
Wall time: 1.56 s

And filter to the most severe earthquakes (magnitude > 7):

most_severe = df[df.mag >= 7]

Linked brushing across elements#

In the previous notebook, we saw how plot axes are automatically linked for panning and zooming when using the + operator, provided the dimensions match. We can also go further and link user selections, not just axes (which is often called “linked brushing”), for plots matching on dimensions or sharing the same underlying index.

To illustrate, let us generate two histograms from our most_severe_projected DataFrame:

mag_hist = most_severe.hvplot(
    y='mag', kind='hist', responsive=True, min_height=150)

depth_hist = most_severe.hvplot(
        y='depth', kind='hist', responsive=True, min_height=150)

These two histograms are plotting two different dimensions of our earthquake dataset (magnitude and depth), derived from the same set of earthquake samples. The samples between these two histograms share an index, and the relationships between these data points can be discovered and exploited programmatically even though they are in different elements. To do this, we can create an object for linking selections across elements:

ls = hv.link_selections.instance()

Given some HoloViews objects (elements, layouts, etc.), we can create versions of them linked to this shared linking object by calling ls on them:

ls(depth_hist + mag_hist)

Try using the first Bokeh tool to select areas of either histogram: you’ll then see both the depth and magnitude distributions for the bins you have selected, compared to the overall distribution. By default, selections on both histograms are combined so that the selection is the intersection of the two regions selected (data points matching both the constraints on depth and the constraints on magnitude that you select). For instance, try selecting the deepest earthquakes (around 600), and you can see that those are not specific to one particular magnitude. You can then further select a particular magnitude range, and see how that range is distributed in depth over the selected depth range. Linked selections like this make it feasible to look at specific regions of a multidimensional space and see how the properties of those regions compare to the properties of other regions. You can use the Bokeh reset tool (double arrow) to clear your selection.

Note that these two histograms are derived from the same DataFrame and created in the same call to ls, but neither of those is necessary to achieve the linked behavior! If linking two different DataFrames, the important thing to check is that any columns with the same name actually do have the same meaning, and that any index columns match, so that the plots you are visualizing make sense when linked together.

Linked brushing across element types#

The previous example linked across two histograms as a first example, but nothing prevents you from linked brushing across different element types. Here are our earthquake points, also derived from the same DataFrame, where the only change from earlier is that we are using the warm colormap (described in the previous notebook):

geo = most_severe.hvplot(
    'easting', 'northing', color='mag', kind='points', tiles='EsriUSATopo',
    xaxis=None, yaxis=None, responsive=True, height=350, cmap = cc.CET_L4[:50:-1], framewise=True)

Once again, we just need to pass our points to the ls object (newly declared here to be independent of the one above) to declare the linkage:

ls2 = hv.link_selections.instance()

(ls2(geo + depth_hist)).cols(1)

Now you can use the box-select tool to select earthquakes on the map and view their corresponding depth distribution, or vice versa. E.g. if you select just the earthquakes in Alaska, you can see that they tend not to be very deep underground (though that may be a sampling issue). Other selections will show other properties, in this case typically with no obvious relationship between geographic location and depth distribution.

Accessing the data selection#

If you pass your DataFrame into the .filter method of your linked selection object, you can apply the active filter from your interactive plot to create a table of the actual selected data points:

ls2.filter(most_severe)

	depth	depthError	dmin	gap	horizontalError	id	latitude	locationSource	longitude	mag	...	magType	net	nst	place	rms	status	type	updated	easting	northing
time
2000-01-08 16:47:20.580000+00:00	183.40	NaN	NaN	NaN	NaN	usp0009kx3	-16.9250	us	-174.2480	7.2	...	mwc	us	NaN	Tonga	1.25	reviewed	earthquake	2017-11-07T16:16:22.048Z	-1.939720e+07	-1.912096e+06
2000-02-25 01:43:58.640000+00:00	33.00	NaN	NaN	NaN	NaN	usp0009nxg	-19.5280	us	173.8180	7.1	...	mwc	us	NaN	Vanuatu region	1.20	reviewed	earthquake	2017-11-07T16:17:30.218Z	1.934933e+07	-2.217199e+06
2000-03-28 11:00:22.510000+00:00	126.50	NaN	NaN	NaN	NaN	usp0009qb4	22.3380	us	143.7300	7.6	...	mwc	us	NaN	Volcano Islands, Japan region	1.22	reviewed	earthquake	2018-10-17T19:37:57.922Z	1.599995e+07	2.552155e+06
2000-04-23 09:27:23.320000+00:00	608.50	NaN	NaN	NaN	NaN	usp0009rrc	-28.3070	us	-62.9900	7.0	...	mwb	us	NaN	Santiago Del Estero, Argentina	0.89	reviewed	earthquake	2017-11-07T16:14:17.222Z	-7.012015e+06	-3.287735e+06
2000-05-12 18:43:18.120000+00:00	225.00	4.6	NaN	NaN	NaN	usp0009suu	-23.5480	us	-66.4520	7.2	...	mwc	us	NaN	Jujuy, Argentina	0.86	reviewed	earthquake	2017-11-07T16:21:24.397Z	-7.397403e+06	-2.698426e+06
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
2018-11-30 17:29:29.330000+00:00	46.70	0.1	NaN	NaN	NaN	ak20419010	61.3464	ak	-149.9552	7.1	...	mwc	ak	NaN	14km NNW of Anchorage, Alaska	1.04	reviewed	earthquake	2019-06-20T17:32:06.274Z	-1.669294e+07	8.705799e+06
2018-12-29 03:39:09.740000+00:00	60.21	3.2	1.769	21.0	6.1	us2000iyta	5.8983	us	126.9209	7.0	...	mww	us	NaN	96km ESE of Pondaguitan, Philippines	1.45	reviewed	earthquake	2019-03-05T17:46:37.040Z	1.412877e+07	6.577586e+05
2018-12-20 17:01:55.150000+00:00	16.56	2.9	4.126	23.0	7.9	us2000ivfw	55.0999	us	164.6993	7.3	...	mww	us	NaN	83km W of Nikol'skoye, Russia	0.79	reviewed	earthquake	2019-02-23T20:21:43.040Z	1.833424e+07	7.381279e+06
2018-12-11 02:26:29.420000+00:00	133.00	1.9	7.043	20.0	9.8	us2000isc8	-58.5446	us	-26.3856	7.1	...	mww	us	NaN	54km N of Bristol Island, South Sandwich Islands	0.92	reviewed	earthquake	2019-02-16T22:29:03.040Z	-2.937232e+06	-8.082602e+06
2018-12-05 04:18:08.420000+00:00	10.00	1.5	2.405	18.0	5.1	us1000i2gt	-21.9496	us	169.4266	7.5	...	mww	us	NaN	165km ESE of Tadine, New Caledonia	0.74	reviewed	earthquake	2019-02-16T19:52:20.040Z	1.886048e+07	-2.505475e+06

295 rows × 23 columns

Exercise#

Try selecting a small number of earthquakes on the map above and re-running the previous cell. You should see that your DataFrame only includes the earthquakes you have selected. You can use this linked selections feature in your own workflows by selecting a region of your data, then inspecting or running subsequent analyses only on that subset of the data (or comparing that subset to the whole data set).

Conclusion#

When exploring data it can be convenient to use the .plot API to quickly visualize a particular dataset. By calling .hvplot to generate different plots over the course of a session, it is possible to gradually build up a mental model of how a particular dataset is structured. Linked selections let you see relationships between your data’s dimensions and clusters of datapoints much more directly, so that you can:

Interactively explore high-dimensional data by making selections across different views of the same underlying samples.
Turn this interactive exploration into a Python subselection of your data, allowing you to continue your data analysis on a subset of your data that you interactively selected.

This approach is very general and allows a deeper understanding of high-dimensional data through interactivity. This interactivity is itself built on the very powerful HoloViews ‘streams’ system which you can leverage for yourself to build youw own Custom Interactivity (optional, advanced topic) when necessary.

In the next section we will see how to apply data processing in a pipelined form, allowing us to build interactive visualizations driven by user-defined widgets when we want to have custom control over our data processing and selection.

This web page was generated from a Jupyter notebook and not all interactivity will work on this website. Right click to download and run locally for full Python-backed interactivity.

Right click to download this notebook from GitHub.