Plotting#
When trying to make sense of data, there are many representations to choose from, including data tables, textual summaries and so on. We’ll mostly focus on plotting data to get an intuitive visual representation, using a simple but powerful plotting API.
If you have tried to visualize a pandas.DataFrame
before, then you have likely encountered the Pandas .plot() API. These plotting commands use Matplotlib to render static PNGs or SVGs in a Jupyter notebook using the inline
backend, or interactive figures via %matplotlib widget
, with a command that can be as simple as df.plot()
for a DataFrame with one or two columns.
The Pandas .plot() API has emerged as a de-facto standard for high-level plotting APIs in Python, and is now supported by many different libraries that use various underlying plotting engines to provide additional power and flexibility. Learning this API allows you to access capabilities provided by a wide variety of underlying tools, with relatively little additional effort. The libraries currently supporting this API include:
Pandas – Matplotlib-based API included with Pandas. Static or interactive output in Jupyter notebooks.
xarray – Matplotlib-based API included with xarray, based on pandas .plot API. Static or interactive output in Jupyter notebooks.
hvPlot – Bokeh/Matplotlib/Plotly-based HoloViews plots for Pandas, GeoPandas, xarray, Dask, Intake, and Streamz data.
Pandas Bokeh – Bokeh-based interactive plots, for Pandas, GeoPandas, and PySpark data.
Cufflinks – Plotly-based interactive plots for Pandas data.
Plotly Express – Plotly-Express-based interactive plots for Pandas data; only partial support for the .plot API keywords.
PdVega – Vega-lite-based, JSON-encoded interactive plots for Pandas data.
In this notebook we’ll explore what is possible with the default .plot
API and demonstrate the additional capabilities provided by .hvplot
, which include seamless interactivity in notebooks and deployed dashboards, server-side rendering of even the largest datasets, automatic small multiples and widget selectors for exploring complex data, and easy composition and linking of plots after they are generated.
To show these features, we’ll use a tabular dataset of earthquakes and other seismological events queried from the USGS Earthquake Catalog using its API. Of course, this particular dataset is just an example; the same approach can be used with just about any tabular dataset, and similar approaches can be used with gridded (multidimensional array) datasets.
Read in the data#
Here we will focus on Pandas, but a similar approach will work for any supported DataFrame type, including Dask for distributed computing or RAPIDS cuDF for GPU computing. This dataset is relatively large (2.1 million rows), but should still fit into memory on any recent machine, and thus won’t need special out-of-core or distributed approaches like Dask provides.
import pathlib
import pandas as pd
%%time
df = pd.read_parquet(pathlib.Path('../data/earthquakes-projected.parq'))
df = df.set_index(df.time)
CPU times: user 3.7 s, sys: 1.07 s, total: 4.77 s
Wall time: 4.78 s
print(df.shape)
df.head()
(2116537, 25)
index | depth | depthError | dmin | gap | horizontalError | id | latitude | locationSource | longitude | ... | net | nst | place | rms | status | time | type | updated | easting | northing | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
time | |||||||||||||||||||||
2000-01-31 23:52:00.619000+00:00 | 0 | 7.800 | 1.400 | 0.09500 | 245.14 | NaN | nn00001936 | 37.1623 | nn | -116.6037 | ... | nn | 5.0 | Nevada | 0.0519 | reviewed | 2000-01-31 23:52:00.619000+00:00 | earthquake | 2018-04-24T22:22:44.135Z | -1.298026e+07 | 4.461754e+06 |
2000-01-31 23:44:54.060000+00:00 | 1 | 4.516 | 0.479 | 0.05131 | 52.50 | NaN | ci9137218 | 34.3610 | ci | -116.1440 | ... | ci | 0.0 | 26km NNW of Twentynine Palms, California | 0.1300 | reviewed | 2000-01-31 23:44:54.060000+00:00 | earthquake | 2016-02-17T11:53:52.643Z | -1.292909e+07 | 4.077379e+06 |
2000-01-31 23:28:38.420000+00:00 | 2 | 33.000 | NaN | NaN | NaN | NaN | usp0009mwt | 10.6930 | trn | -61.1620 | ... | us | NaN | Trinidad, Trinidad and Tobago | NaN | reviewed | 2000-01-31 23:28:38.420000+00:00 | earthquake | 2014-11-07T01:09:23.016Z | -6.808523e+06 | 1.197310e+06 |
2000-01-31 23:05:22.010000+00:00 | 3 | 33.000 | NaN | NaN | NaN | NaN | usp0009mws | -1.2030 | us | -80.7160 | ... | us | NaN | near the coast of Ecuador | 0.6000 | reviewed | 2000-01-31 23:05:22.010000+00:00 | earthquake | 2014-11-07T01:09:23.014Z | -8.985264e+06 | -1.339272e+05 |
2000-01-31 22:56:50.996000+00:00 | 4 | 7.200 | 0.900 | 0.11100 | 202.61 | NaN | nn00001935 | 38.7860 | nn | -119.6409 | ... | nn | 5.0 | Nevada | 0.0715 | reviewed | 2000-01-31 22:56:50.996000+00:00 | earthquake | 2018-04-24T22:22:44.054Z | -1.331836e+07 | 4.691064e+06 |
5 rows × 25 columns
To compare HoloViz approaches with other approaches, we’ll also construct a subsample of the dataset that’s tractable with any plotting or analysis tool, but has only 1% of the data:
small_df = df.sample(frac=.01)
print(small_df.shape)
small_df.head()
(21165, 25)
index | depth | depthError | dmin | gap | horizontalError | id | latitude | locationSource | longitude | ... | net | nst | place | rms | status | time | type | updated | easting | northing | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
time | |||||||||||||||||||||
2000-10-06 14:43:27.510000+00:00 | 4789 | 99.000 | NaN | NaN | NaN | NaN | usp000a1bu | -34.209000 | guc | -70.507000 | ... | us | 5.0 | Libertador General Bernardo O'Higgins, Chile | NaN | reviewed | 2000-10-06 14:43:27.510000+00:00 | earthquake | 2014-11-07T01:11:14.419Z | -7.848803e+06 | -4.056900e+06 |
2008-02-17 22:10:06.490000+00:00 | 4716 | 5.787 | 31.61 | 0.56310 | 274.0 | 3.67 | ci14350616 | 32.089667 | ci | -115.815667 | ... | ci | 19.0 | 59km SSW of Progreso, B.C., MX | 0.73 | reviewed | 2008-02-17 22:10:06.490000+00:00 | earthquake | 2016-03-08T21:02:25.311Z | -1.289254e+07 | 3.775087e+06 |
2005-04-11 05:36:02.960000+00:00 | 6483 | 30.000 | NaN | NaN | 208.5 | NaN | usp000dn0m | 2.558000 | us | 95.965000 | ... | us | 19.0 | Simeulue, Indonesia | 0.48 | reviewed | 2005-04-11 05:36:02.960000+00:00 | earthquake | 2014-11-07T01:25:33.013Z | 1.068277e+07 | 2.848499e+05 |
2011-02-02 20:14:47.880000+00:00 | 6734 | 0.843 | 0.92 | 0.01945 | 56.0 | 0.38 | ci14927924 | 33.162667 | ci | -115.655667 | ... | ci | 33.0 | 14km WNW of Calipatria, CA | 0.22 | reviewed | 2011-02-02 20:14:47.880000+00:00 | earthquake | 2016-03-14T20:13:59.938Z | -1.287473e+07 | 3.916915e+06 |
2011-11-05 04:44:07.960000+00:00 | 7295 | 4.080 | 0.48 | 0.04865 | 232.0 | 0.79 | nc71677101 | 37.564167 | nc | -119.008500 | ... | nc | 9.0 | Central California | 0.05 | reviewed | 2011-11-05 04:44:07.960000+00:00 | earthquake | 2017-01-25T13:00:26.093Z | -1.324797e+07 | 4.518039e+06 |
5 rows × 25 columns
We’ll switch back and forth between small_df
and df
depending on whether the technique we are showing works well only for small datasets, or whether it can be used for any dataset.
Using Pandas .plot()
#
The first thing that we’d like to do with this data is visualize the locations of every earthquake. So we would like to make a scatter or points plot where x is longitude and y is latitude.
We can do that for the smaller dataframe using the pandas.plot
API and Matplotlib:
%matplotlib inline
small_df.plot.scatter(x='longitude', y='latitude');

Exercise:#
Try changing inline
to widget
and see what interactivity is available from Matplotlib. In some cases you may have to reload the page and restart this notebook to get it to display properly.
Using .hvplot
#
As you can see above, the Pandas API gives you a usable plot very easily, where you can start to see the structure of the edges of the tectonic plates, which in many cases correspond with the visual edges of continents (e.g. the westward side of Africa, in the center). You can make a very similar plot with the same arguments using hvplot, after importing hvplot.pandas
to install hvPlot support into Pandas:
import hvplot.pandas # noqa: adds hvplot method to pandas objects