Building Pipelines#
In this exercise we will explore hvplot .interactive()
to add widgets to our analyses and plots.
We’ll first load the earthquakes DataFrame
and filter to those with >=7
magnitude:
import pathlib
import pandas as pd
import xarray as xr
import panel as pn # noqa
import hvplot.pandas # noqa: adds hvplot method to pandas objects
import hvplot.xarray # noqa: adds hvplot method to xarray objects
df = pd.read_parquet(pathlib.Path('../../data/earthquakes-projected.parq'))
columns = ['time', 'mag', 'depth', 'latitude', 'longitude', 'place', 'type']
df = df.set_index(df.time)[columns]
most_severe = df[df.mag >= 7]
df.head()
time | mag | depth | latitude | longitude | place | type | |
---|---|---|---|---|---|---|---|
time | |||||||
2000-01-31 23:52:00.619000+00:00 | 2000-01-31 23:52:00.619000+00:00 | 0.60 | 7.800 | 37.1623 | -116.6037 | Nevada | earthquake |
2000-01-31 23:44:54.060000+00:00 | 2000-01-31 23:44:54.060000+00:00 | 1.72 | 4.516 | 34.3610 | -116.1440 | 26km NNW of Twentynine Palms, California | earthquake |
2000-01-31 23:28:38.420000+00:00 | 2000-01-31 23:28:38.420000+00:00 | 2.10 | 33.000 | 10.6930 | -61.1620 | Trinidad, Trinidad and Tobago | earthquake |
2000-01-31 23:05:22.010000+00:00 | 2000-01-31 23:05:22.010000+00:00 | 4.50 | 33.000 | -1.2030 | -80.7160 | near the coast of Ecuador | earthquake |
2000-01-31 22:56:50.996000+00:00 | 2000-01-31 22:56:50.996000+00:00 | 1.40 | 7.200 | 38.7860 | -119.6409 | Nevada | earthquake |
Initial inspection of the depth data#
Declare and display a depth float slider with the handle depth_slider
(and named ‘Minimum depth’) that ranges between zero and 700 meters and verify that the depth values in most_severe
lie in this range. Set the default value to the middle of this range.
Hint
You can use the min()
and max()
method on the depth
Series
of most_severe
to check the range. To declare the slider, use a pn.widgets.FloatSlider
.
depth_slider = ...
depth_slider
Ellipsis
depth_slider = pn.widgets.FloatSlider(name='Minimum depth', start=0, end=700, value=350)
depth_slider
>> most_severe.depth.min()
4.2
>> most_severe.depth.max()
675.4
Exploring an interactive DataFrame
#
Now we will create a new interactive DataFrame
called dfi
with sizing_mode='stretch_width'
.
Hint
Use the .interactive
method on most_severe
to create the interactive DataFrame
called dfi
dfi = ... # Interactive DataFrame version of most_severe
dfi = most_severe.interactive(sizing_mode='stretch_width')
Now use this interactive Dataframe
to filter the earthquakes deeper than specified by the depth_slider
. Call this filtered dataframe depth_filtered
and to view it conveniently, use the .head()
method to see the first few entries.
Hint
Use the the regular pandas idiom to filter a DataFrame
with df[mask]
where mask
is boolean mask. The only difference is instead of picking a fixed depth value to filter by, you can use the depth_slider
widget instead.
depth_filtered = ...
# Now display the head of this interactive dataframe
depth_slider = pn.widgets.FloatSlider(name='Minimum depth', start=0, end=700, value=350)
dfi = most_severe.interactive(sizing_mode='stretch_width')
depth_filtered = dfi[dfi['depth'] < depth_slider]
depth_filtered.head()
Plotting the depth filtered data#
For an initial plot, try calling .hvplot()
and seeing what happens.
# depth_filtered.hvplot()
Now let us focus on plotting the magnitude of the filtered earthquakes as a scatter plot with red cross (x
) markers.
Hint
The magnitude column is called mag
, you can set cross markers with marker='x'
and to get a scatter plot you can use kind='scatter'
.
# Scatter plot of magnitude, filtered by depth with red cross markers
depth_slider = pn.widgets.FloatSlider(name='Minimum depth', start=0, end=700, value=350)
dfi = most_severe.interactive(sizing_mode='stretch_width')
depth_filtered = dfi[dfi['depth'] < depth_slider]
depth_filtered.hvplot(y='mag', kind='scatter', color='red', marker='x')
Using interactive xarrays#
The .interactive
interface doesn’t only apply to pandas DataFrames
: you can use the same approach with xarray. Here we load our population raster and perform some simple cleanup:
raw_ds = xr.open_dataarray(pathlib.Path('../../data/raster/gpw_v4_population_density_rev11_2010_2pt5_min.nc'))
cleaned_ds = raw_ds.where(raw_ds.values != raw_ds.nodatavals).sel(band=1)
cleaned_ds = cleaned_ds.rename({'x': 'longitude','y': 'latitude'})
cleaned_ds.name = 'population'
cleaned_ds = cleaned_ds.fillna(0)
One operation we could do on this raster is to collapse one of the two dimensions. For instance, we could view the mean population over latitude (averaged over longitude) or conversely the mean population over longitude (averaged over latitude). To select between these options, we will want a dropdown widget called collapsed_axis
.
Hint
A dropdown widget in panel can be made with a pn.widgets.Select
object. The dropdown options are specified as a list of strings to the options
argument.
collapsed_axis = ... # Declare a dropdown to select either 'latitude' or 'longitude' and display it
collapsed_axis = pn.widgets.Select(options=['latitude', 'longitude'], name='Collapsed dimension')
collapsed_axis
Now create an interactive xarray DataArray
called dsi
in the analogous fashion to the interactive DataFrame
we created earlier. As before, specify sizing_mode='stretch_width'
.
Hint
As before, the interactive object is created by calling the .interactive()
method. This time the method is called on an xarray object instead of a pandas object.
dsi = ... # An interactive DataArray
dsi = cleaned_ds.interactive(sizing_mode='stretch_width')
Plotting population averaged over either latitude or longitude#
Now we can use the xarray API to collapse either latitude or longitude by taking the mean. To do this, we can use the .mean()
method of an xarray DataArray
which accepts a dim
argument specifying the dimension over which to apply the mean. After collapsing the dimensions specified by the widget, plot the population with a green curve.
Hint
First write and test a static version of your pipeline, where you supply ‘latitude’ or ‘longitude’ explicitly to the dim
argument of the mean
method and then call .hvplot
to plot it while specifying color='green'
. Then try passing your collapsed_axis
widget instead of that fixed string.
# Using `dsi` plot the population as a green curve where the collapsed dimension is selected by the widget
dsi = cleaned_ds.interactive(sizing_mode='stretch_width')
collapsed_axis = pn.widgets.Select(options=['latitude', 'longitude'], name='Collapsed dimension')
dsi.mean(dim=collapsed_axis).hvplot(color='green')