Interlinked Plots#
hvPlot allows you to generate many different types of plot quickly from a standard API, returning HoloViews objects as discussed in the previous notebook. Each initial plot will make some aspects of the data clear, and using the automatic interactive pan, zoom, and hover tools you can find additional trends and outliers at different spatial locations and spatial scales within each plot.
Beyond what you can discover from each plot individually, how do you understand how the various plots relate to each other to reveal the full dataset? For instance, imagine you have a data frame with columns u, v, w, z, and have separate plots of u vs. v, u vs. w, and w vs. z. If you see a few outliers or a clump of unusual datapoints in your u vs. v plot, how can you find out the properties of those points in the w vs. z or other plots? Are those unusual u vs. v points typically high w, uniformly distributed along w, or some other pattern?
To help understand multicolumnar and multidimensional datasets like this, scientists will often build complex multi-pane dashboards with custom functionality. HoloViz (and specifically Panel) tools are great for such dashboards, but here we can actually use the fact that hvPlot returns HoloViews objects to get quite sophisticated interlinking (linked brushing) “for free”, without needing to build any dashboard. HoloViews objects store metadata about what dimensions they cover, and we can use this metadata programmatically to let the user see how any data points in any plot relate across different plots.
To see how this works, let us get back to the example we were working on at the end of the last notebook:
import pathlib
import holoviews as hv
import pandas as pd
import hvplot.pandas # noqa
import colorcet as cc
First let us load the data as before:
%%time
df = pd.read_parquet(pathlib.Path('../data/earthquakes-projected.parq'))
CPU times: user 2.05 s, sys: 352 ms, total: 2.4 s
Wall time: 1.56 s
And filter to the most severe earthquakes (magnitude > 7
):
most_severe = df[df.mag >= 7]
Linked brushing across elements#
In the previous notebook, we saw how plot axes are automatically linked for panning and zooming when using the +
operator, provided the dimensions match. We can also go further and link user selections, not just axes (which is often called “linked brushing”), for plots matching on dimensions or sharing the same underlying index.
To illustrate, let us generate two histograms from our most_severe_projected
DataFrame:
mag_hist = most_severe.hvplot(
y='mag', kind='hist', responsive=True, min_height=150)
depth_hist = most_severe.hvplot(
y='depth', kind='hist', responsive=True, min_height=150)
These two histograms are plotting two different dimensions of our earthquake dataset (magnitude and depth), derived from the same set of earthquake samples. The samples between these two histograms share an index, and the relationships between these data points can be discovered and exploited programmatically even though they are in different elements. To do this, we can create an object for linking selections across elements:
ls = hv.link_selections.instance()
Given some HoloViews objects (elements, layouts, etc.), we can create versions of them linked to this shared linking object by calling ls
on them:
ls(depth_hist + mag_hist)
Try using the first Bokeh tool to select areas of either histogram: you’ll then see both the depth and magnitude distributions for the bins you have selected, compared to the overall distribution. By default, selections on both histograms are combined so that the selection is the intersection of the two regions selected (data points matching both the constraints on depth and the constraints on magnitude that you select). For instance, try selecting the deepest earthquakes (around 600), and you can see that those are not specific to one particular magnitude. You can then further select a particular magnitude range, and see how that range is distributed in depth over the selected depth range. Linked selections like this make it feasible to look at specific regions of a multidimensional space and see how the properties of those regions compare to the properties of other regions. You can use the Bokeh reset tool (double arrow) to clear your selection.
Note that these two histograms are derived from the same DataFrame
and created in the same call to ls
, but neither of those is necessary to achieve the linked behavior! If linking two different DataFrames
, the important thing to check is that any columns with the same name actually do have the same meaning, and that any index columns match, so that the plots you are visualizing make sense when linked together.
Linked brushing across element types#
The previous example linked across two histograms as a first example, but nothing prevents you from linked brushing across different element types. Here are our earthquake points, also derived from the same DataFrame
, where the only change from earlier is that we are using the warm colormap (described in the previous notebook):
geo = most_severe.hvplot(
'easting', 'northing', color='mag', kind='points', tiles='EsriUSATopo',
xaxis=None, yaxis=None, responsive=True, height=350, cmap = cc.CET_L4[:50:-1], framewise=True)
Once again, we just need to pass our points to the ls
object (newly declared here to be independent of the one above) to declare the linkage:
ls2 = hv.link_selections.instance()
(ls2(geo + depth_hist)).cols(1)
Now you can use the box-select tool to select earthquakes on the map and view their corresponding depth distribution, or vice versa. E.g. if you select just the earthquakes in Alaska, you can see that they tend not to be very deep underground (though that may be a sampling issue). Other selections will show other properties, in this case typically with no obvious relationship between geographic location and depth distribution.
Accessing the data selection#
If you pass your DataFrame
into the .filter
method of your linked selection object, you can apply the active filter from your interactive plot to create a table of the actual selected data points:
ls2.filter(most_severe)
depth | depthError | dmin | gap | horizontalError | id | latitude | locationSource | longitude | mag | ... | magType | net | nst | place | rms | status | type | updated | easting | northing | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
time | |||||||||||||||||||||
2000-01-08 16:47:20.580000+00:00 | 183.40 | NaN | NaN | NaN | NaN | usp0009kx3 | -16.9250 | us | -174.2480 | 7.2 | ... | mwc | us | NaN | Tonga | 1.25 | reviewed | earthquake | 2017-11-07T16:16:22.048Z | -1.939720e+07 | -1.912096e+06 |
2000-02-25 01:43:58.640000+00:00 | 33.00 | NaN | NaN | NaN | NaN | usp0009nxg | -19.5280 | us | 173.8180 | 7.1 | ... | mwc | us | NaN | Vanuatu region | 1.20 | reviewed | earthquake | 2017-11-07T16:17:30.218Z | 1.934933e+07 | -2.217199e+06 |
2000-03-28 11:00:22.510000+00:00 | 126.50 | NaN | NaN | NaN | NaN | usp0009qb4 | 22.3380 | us | 143.7300 | 7.6 | ... | mwc | us | NaN | Volcano Islands, Japan region | 1.22 | reviewed | earthquake | 2018-10-17T19:37:57.922Z | 1.599995e+07 | 2.552155e+06 |
2000-04-23 09:27:23.320000+00:00 | 608.50 | NaN | NaN | NaN | NaN | usp0009rrc | -28.3070 | us | -62.9900 | 7.0 | ... | mwb | us | NaN | Santiago Del Estero, Argentina | 0.89 | reviewed | earthquake | 2017-11-07T16:14:17.222Z | -7.012015e+06 | -3.287735e+06 |
2000-05-12 18:43:18.120000+00:00 | 225.00 | 4.6 | NaN | NaN | NaN | usp0009suu | -23.5480 | us | -66.4520 | 7.2 | ... | mwc | us | NaN | Jujuy, Argentina | 0.86 | reviewed | earthquake | 2017-11-07T16:21:24.397Z | -7.397403e+06 | -2.698426e+06 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2018-11-30 17:29:29.330000+00:00 | 46.70 | 0.1 | NaN | NaN | NaN | ak20419010 | 61.3464 | ak | -149.9552 | 7.1 | ... | mwc | ak | NaN | 14km NNW of Anchorage, Alaska | 1.04 | reviewed | earthquake | 2019-06-20T17:32:06.274Z | -1.669294e+07 | 8.705799e+06 |
2018-12-29 03:39:09.740000+00:00 | 60.21 | 3.2 | 1.769 | 21.0 | 6.1 | us2000iyta | 5.8983 | us | 126.9209 | 7.0 | ... | mww | us | NaN | 96km ESE of Pondaguitan, Philippines | 1.45 | reviewed | earthquake | 2019-03-05T17:46:37.040Z | 1.412877e+07 | 6.577586e+05 |
2018-12-20 17:01:55.150000+00:00 | 16.56 | 2.9 | 4.126 | 23.0 | 7.9 | us2000ivfw | 55.0999 | us | 164.6993 | 7.3 | ... | mww | us | NaN | 83km W of Nikol'skoye, Russia | 0.79 | reviewed | earthquake | 2019-02-23T20:21:43.040Z | 1.833424e+07 | 7.381279e+06 |
2018-12-11 02:26:29.420000+00:00 | 133.00 | 1.9 | 7.043 | 20.0 | 9.8 | us2000isc8 | -58.5446 | us | -26.3856 | 7.1 | ... | mww | us | NaN | 54km N of Bristol Island, South Sandwich Islands | 0.92 | reviewed | earthquake | 2019-02-16T22:29:03.040Z | -2.937232e+06 | -8.082602e+06 |
2018-12-05 04:18:08.420000+00:00 | 10.00 | 1.5 | 2.405 | 18.0 | 5.1 | us1000i2gt | -21.9496 | us | 169.4266 | 7.5 | ... | mww | us | NaN | 165km ESE of Tadine, New Caledonia | 0.74 | reviewed | earthquake | 2019-02-16T19:52:20.040Z | 1.886048e+07 | -2.505475e+06 |
295 rows × 23 columns
Exercise#
Try selecting a small number of earthquakes on the map above and re-running the previous cell. You should see that your DataFrame
only includes the earthquakes you have selected. You can use this linked selections feature in your own workflows by selecting a region of your data, then inspecting or running subsequent analyses only on that subset of the data (or comparing that subset to the whole data set).
Conclusion#
When exploring data it can be convenient to use the .plot
API to quickly visualize a particular dataset. By calling .hvplot
to generate different plots over the course of a session, it is possible to gradually build up a mental model of how a particular dataset is structured. Linked selections let you see relationships between your data’s dimensions and clusters of datapoints much more directly, so that you can:
Interactively explore high-dimensional data by making selections across different views of the same underlying samples.
Turn this interactive exploration into a Python subselection of your data, allowing you to continue your data analysis on a subset of your data that you interactively selected.
This approach is very general and allows a deeper understanding of high-dimensional data through interactivity. This interactivity is itself built on the very powerful HoloViews ‘streams’ system which you can leverage for yourself to build youw own Custom Interactivity (optional, advanced topic) when necessary.
In the next section we will see how to apply data processing in a pipelined form, allowing us to build interactive visualizations driven by user-defined widgets when we want to have custom control over our data processing and selection.