Skip to content

HoloViews based plotting API #2199

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
philippjfr opened this issue May 30, 2018 · 10 comments
Closed

HoloViews based plotting API #2199

philippjfr opened this issue May 30, 2018 · 10 comments

Comments

@philippjfr
Copy link

As part of a recent project we have been working on a plotting API for a number of projects including pandas and xarray called HoloPlot. You can see some examples using the API with xarray here. As the name suggests it is built on HoloViews and is meant as an alternative for the native plotting API that closely mirrors but does not necessarily match those APIs exactly. The main differences are:

  • Certain keywords are likely to differ, e.g. width/height vs fig_inches
  • The API returns HoloViews objects which can be composed and display themselves
  • It supports some additional features such as datashading and exploring a parameter space with widgets

The main question I'd like to put to the xarray community is how we should best expose this API. In pandas there has been some discussion to add a configurable engine for the plotting API letting you switch between different plotting implementations (see pandas-dev/pandas#14130). The approach we started with was to clobber the DataArray.plot API entirely, which I now consider to obtrusive and likely to interfere with existing workflows. The alternative approaches we considered:

  • Name the patched method different, e.g. DataArray.hvplot, DataArray.hplot or DataArray.holoplot
  • Patch DataArray.plot but add an engine keyword to toggle between the original and HoloPlot API.
  • Add a global toggle to switch between the APIs (likely in addition to the engine keyword)

I'd love to hear what xarray maintainers and users think would be the best approach here.

@fmaussion
Copy link
Member

It looks like a good use case for accessors. The syntax could then be: DataArray.hv.plot() and would give you full flexibility.

@shoyer
Copy link
Member

shoyer commented May 30, 2018

Very cool! I also think this would be a good use case for a new accessor, perhaps DataArray.holoplot() mirroring our preference for accessor names to match projects.

An engine keyword/option could also be viable, but would require more coordination (e.g., figuring out the plotting interface, which seems to have stalled that plotting issue). That said, if pandas figured out a way to do this I'm sure we would be happy to copy it.

@philippjfr
Copy link
Author

Thanks for the feedback! I'll try to drive the pandas conversation along, but since I doubt that will be resolved in the near term so I think until then we should definitely pursue the accessor approach (which is much better than the property monkey patching we're doing now).

Personally I'd prefer DataArray.hvplot() since I think even the two extra characters make a difference and something like DataArray.hv.plot.contourf() seems too deeply nested. That said if "our preference for accessor names to match projects" is a solidly established convention I'll defer to that and go with DataArray.holoplot().

@rabernat Since you have used HoloViews with xarray in the past I'd very appreciate your input as well.

@rabernat
Copy link
Contributor

I am a big fan of holoviews and have been using it extensively for my own work in recent months. So obviously I am a big 👍 on this integration.

I agree the accessor is the best option for now, but I have no strong opinions about the name of the accessor.

Some features I would like to see are things that go beyond the plotting capabilities associated with the matplotlib engine. For example:

  • Automatic generation of DynamicMaps. Say I have a DataArray with dimensions ('time', 'lat', 'lon'); I should be able to say da.hv.plot(kdims=['lat', 'lon'] and have time become a dynamic selector.
  • To go along with the above, lazy loading of dask-backed arrays
  • Intelligent faceting which automatically links the facet kdims
  • Plotting not just of DataArrays but Datasets. The variable itself could become a dynamic selector in a dropdown menu. Basically, I just want to say ds.hv.plot() and have holoviews provide all the options I need to explore the dataset interactively. Kind of like how ncview works. At that point, we won't need ncview anymore.
  • Options for projections, coastlines, etc. associated with geoviews

@rabernat
Copy link
Contributor

Oh and another big 👍 to the datashader integration. This is crucial for my datasets.

@philippjfr
Copy link
Author

philippjfr commented May 30, 2018

I agree the accessor is the best option for now, but I have no strong opinions about the name of the accessor.

Okay thanks, given xarray's preference for accessor names to match projects I'm now leaning toward da.holoplot().

Automatic generation of DynamicMaps. Say I have a DataArray with dimensions ('time', 'lat', 'lon'); I should be able to say da.hv.plot(kdims=['lat', 'lon'] and have time become a dynamic selector.

HoloPlot explicitly does not deal with kdims and vdims instead more closely following the API of pd.DataFrame.plot and xr.DataArray. That said coordinates that are not assigned to the x/y axes will automatically result in a DynamicMap, so this will give you an image plot + a widget to select the time:

da.holoplot(x='lon', y='lat', kind='image')

To go along with the above, lazy loading of dask-backed arrays

That should happen automatically.

Intelligent faceting which automatically links the facet kdims

You can facet in a number of ways:

da.isel(time=slice(0, 3)).holoplot(x='lon', y='lat', kind='image', by='time')

will produce three subplots which are linked on the x- and y-axis, i.e. zooming on one will zoom on all unless you set shared_axes=False. You can also generate a grid with:

da.isel(time=slice(0, 3)).holoplot(x='lon', y='lat', kind='image', row='time', col='some_other_coord')

Plotting not just of DataArrays but Datasets.

This is also already supported, the API here is:

ds.holoplot(x='lon', y='lat', z=['air', 'surface'])

Will provide a widget to select between the 'air' and 'surface' data variable.

Options for projections, coastlines, etc. associated with geoviews

Currently working on that, it's basically just waiting on new HoloViews/GeoViews releases. The API here is as follows:

air_ds.air.holoplot.quadmesh(
    'lon', 'lat', ['air', 'some_other_variable'], crs=ccrs.PlateCarree(), projection=ccrs.Orthographic(-80, 30),
    global_extent=True, width=600, height=500, cmap='viridis'
) * gv.feature.coastline

screen shot 2018-05-30 at 9 03 53 pm

@philippjfr
Copy link
Author

philippjfr commented May 30, 2018

something like DataArray.hv.plot.contourf() seems too deeply nested.

Actually I suppose that's not what it would be, it could be da.hv.plot and da.hv.contourf with .plot figuring out the kind for you. I quite like that too.

@shoyer
Copy link
Member

shoyer commented Jun 1, 2018

I'm not strongly opposed to something like DataArray.hvplot for the accessor, it's just slightly less obvious than DataArray.holoplot.

hv would probably be too short for a good name (but of course this is totally up to you), especially because I can imagine people using hv for a variables name, which can also be accessed via attributes.

@philippjfr
Copy link
Author

Thanks again for the feedback, I've decided to go with .holoplot in the end. I'll work on finishing some of geo related features today and get a 0.1 release and announcement out this week.

@philippjfr
Copy link
Author

Thanks for everyone's feedback, due to trademark concerns we decided to rename both the library and the API to .hvplot. There should be a release and an announcement in the coming week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants