Overview#

Revealing your data (nearly) effortlessly,
at every step in your workflow


Workflow from data to decision#


If there's no visualization at any of these stages, you're flying blind.

But visualization is often skipped as too hard to construct, particularly for big data.

What if it were simple to visualize anything, anywhere?


Good news/
Bad news


Lots of choices!
Too hard to
try them all,
learn them all, or
get them to work together.


HoloViz:



Seamless interoperability
for browser-based
viz tools

Supported by Anaconda, Inc.


HoloViz Goals:#

  • Full functionality in browsers (not desktop)

  • Full interactivity (inside and out of plots)

  • Focus on Python users, not web programmers

  • Start with data, not coding

  • Work with data of any size

  • Exploit general-purpose SciPy/PyData tools

  • Focus on 2D primarily, with some 3D

  • Avoid entangling your data, code, and viz:

    • Same viz/analysis code in Jupyter, Python, HPC, …

    • Widgets/apps in Jupyter, standalone servers, web pages

    • Jupyter as a tool, not part of the results

Exploring Pandas Dataframes#

If your data is in a Pandas dataframe, it’s natural to explore it using the .plot() method (based on Matplotlib). Let’s look at a dataset of the number of cases of measles and pertussis (per 100,000 people) over time in each state:

from pathlib import Path
import pandas as pd

df = pd.read_csv(Path('../../data/diseases.csv.gz'))
df.head()
Year Week State measles pertussis
0 1928 1 Alabama 3.67 NaN
1 1928 2 Alabama 6.25 NaN
2 1928 3 Alabama 7.95 NaN
3 1928 4 Alabama 12.58 NaN
4 1928 5 Alabama 8.03 NaN

Just calling .plot() won’t give anything meaningful, because it doesn’t know what should be plotted against what:

%matplotlib inline

df.plot();
../../_images/74b0c8a2ada45a8221e5b87be1379e650bf891b805961b0c856363fde6a2c21d.png

But with some Pandas operations we can pull out parts of the data that make sense to plot:

import numpy as np

by_year = df[["Year","measles"]].groupby("Year").aggregate(np.sum)
by_year.plot();
../../_images/b2e341971ad99eeb2e492aa5719468e3e93ac049b09c3845e9f6674e8baa4533.png

Here it is easy to see that the 1963 introduction of a measles vaccine brought the cases down to negligible levels.

Exploring Data with hvPlot and Bokeh#

The above plots are just static images, but if you import the hvplot package, you can use the same plotting API to get fully interactive plots with hover, pan, and zoom in a web browser:

import hvplot.pandas # noqa: adds hvplot method to pandas objects

by_year.hvplot()
# Optional; load Matplotlib support (defaults to Bokeh)
hvplot.extension('bokeh', 'matplotlib', width="100")

Here the interactive features are provided by the Bokeh JavaScript-based plotting library. But what’s actually returned by this call is something called a HoloViews object, here specifically a HoloViews Curve. HoloViews objects display as a Bokeh plot, but they are actually much richer objects that make it easy to capture your understanding as you explore the data:

import holoviews as hv
vline = hv.VLine(1963).opts(color='black')

m = by_year.hvplot() * vline * \
    hv.Text(1963, 27000, " Vaccine introduced", halign='left')
m

while still always being able to access the original data involved for further analysis:

print(m)
m.Curve.I.data.head()
:Overlay
   .Curve.I :Curve   [Year]   (measles)
   .VLine.I :VLine   [x,y]
   .Text.I  :Text   [x,y]
measles
Year
1928 16924.34
1929 12060.96
1930 14575.11
1931 15427.67
1932 14481.11

For other plotting libraries, a given visualization that you construct is a dead end – if you want to change it in some way, you’ll need to reconstruct it from scratch with different settings.

Because HoloViews objects preserve your original data, you can now do more with your data than you could before, including anything you could do with the raw data, plus overlaying (as above), laying out in subfigures, slicing, sampling, setting options, and many other operations.

For instance, with HoloViews it’s simple to break down the data in different ways. You can inspect each state individually:

measles_agg = df.groupby(['Year', 'State'])['measles'].sum()
by_state = measles_agg.hvplot('Year', groupby='State', width=500, dynamic=False)

by_state * vline

Or pull out a couple of those to put side by side:

by_state["Texas"].relabel('Texas') + by_state["New York"].relabel('New York')

Or to compare four states over time by overlaying:

states = ['New York', 'New Jersey', 'California', 'Texas']
measles_agg.loc[1930:2005, states].hvplot(by='State') * vline

Or by faceting:

measles_agg.loc[1930:2005, states].hvplot('Year', col='State', width=200, height=150, rot=90) * vline
WARNING:param.OverlayPlot02970: :NdOverlay   [Variable] is empty and will be skipped during plotting
WARNING:param.OverlayPlot02977: :NdOverlay   [Variable] is empty and will be skipped during plotting
WARNING:param.OverlayPlot02984: :NdOverlay   [Variable] is empty and will be skipped during plotting
WARNING:param.OverlayPlot02991: :NdOverlay   [Variable] is empty and will be skipped during plotting

Or as a different type of plot, such as a bar chart:

measles_agg.loc[1980:1990, states].hvplot.bar('Year', by='State', rot=90)

Or with additional information, such as error bars:

df_error = df.groupby('Year').agg({'measles': [np.mean, np.std]}).xs('measles', axis=1)
df_error.hvplot(y='mean') * hv.ErrorBars(df_error, 'Year').redim.range(mean=(0, None)) * vline

If we really want to invest a lot of time in making a fancy plot, we can customize it to try to show all the yearly data about measles at once:

def nansum(a, **kwargs):
    return np.nan if np.isnan(a).all() else np.nansum(a, **kwargs)
heatmap = df.hvplot.heatmap('Year', 'State', 'measles', reduce_function=nansum,
    logz=True, height=500, width=900, xaxis=None, flip_yaxis=True, clim=(1, np.nan))

aggregate = hv.Dataset(heatmap).aggregate('Year', np.mean, np.std)
agg = hv.ErrorBars(aggregate) * hv.Curve(aggregate).opts(xrotation=90)
agg = agg.options(height=200, show_title=False)

marker = hv.Text(1963, 800, u'\u2193 Vaccine introduced', halign='left')
(heatmap + (agg * marker).opts(width=900)).cols(1)

If you prefer, you can choose Matplotlib to render your HoloViews plots, though you give up the interactive pan, zoom, and hover from Bokeh:

mpl = by_state * hv.VLine(1963).opts(color="black") * \
      hv.Text(1963, 1000, "  Vaccine introduced", halign='left')
hv.output(mpl, backend='matplotlib')