Skip to content

Plans for full xarray compatibility? #66

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
gmacgilchrist opened this issue Sep 3, 2020 · 6 comments · Fixed by #67
Closed

Plans for full xarray compatibility? #66

gmacgilchrist opened this issue Sep 3, 2020 · 6 comments · Fixed by #67

Comments

@gmacgilchrist
Copy link

The toolbox is broadly compatible with xarray DataArrays, but struggles when needing to do things such as broadcast along specific dimensions (a particularly powerful feature of xarray objects), or when xarray objects are separated into "chunks" using dask. The broadcasting issue can be worked around using the xarray apply_ufunc command, but this requires writing a wrapper function for each gsw function. That approach still fails to handle chunked arrays, so that the workaround is to extract the underlying numpy arrays from the DataArrays and then plug the output of the gsw function back into a new DataArray. Some example wrapper functions can be seen here.

I'm wondering if there are any plans to improve this compatibility with xarray, essentially to allow xarray DataArrays (as well as numpy arrays) to be passed directly to the gsw functions, or whether I would be better placed to try to write an overall "wrapper package" to brute force compatibility.

@efiring
Copy link
Member

efiring commented Sep 3, 2020

There are no plans because I don't yet understand how to do it. See discussion in #65. Is it just a matter of special-casing DataArrays so that they are passed through by the match_args_return decorator? And then the magic happens in the existing ufuncs, provided the numpy and xarray versions are new enough?

@gmacgilchrist
Copy link
Author

Thanks for linking that to that discussion, which I hadn't realized was about the same thing. Unfortunately I'm not a wizard, so I wouldn't want to hazard a guess at where the magic happens, or how challenging it would be! My intuition says "not too hard", if one of the active xarray developers could be roped in to assist.

If everything can be done with xr.apply_ufunc then perhaps this is sufficient - it just looks a little clunky if you are doing lots of gsw calculations. I also thought that I had uncovered some edge case bugs when the DataArray is chunked, but I will stress test this some more - it could be to do with how I am applying the ufunc.

@DocOtak
Copy link
Contributor

DocOtak commented Sep 3, 2020

I started looking into this after #65 was made. NEP 18 appears to apply only to folks implementing numpy like containers or subclasses. This toolbox only provides a pile of ufuncs that operate on numpy like containers. Xarray does support directly calling ufuncs on its objects . The problem I think is the "dask" argument, passing a value of "allowed" means the dask array(s) are passed directly to the ufunc which is expected to be "dask aware". The argument of "parallelized" is what you need for ufuncs that are not dask aware.

So what we actually need to do (I think) is be dask/chunked array aware, and I have no idea how that is done.

@gmacgilchrist can you try changing your "allowed" to "parallelized" and see what happens when you remove the .values calls?

@gmacgilchrist
Copy link
Author

Yes, I think this was the issue, setting to 'parallelized' appears to have done the trick. Previously, it seemed as though gsw was confused about the chunked dimension and thus simply ignoring it, such that it tried to return an N-d array as an (N-1)-d array, which crashed.

If xr.apply_ufunc works seamlessly, the only consideration then is beautifying of code, which doesn't seem worth extensive effort? Perhaps it's just worth a short example in the README or a tutorial, to show the use of xr.apply_ufunc for anyone working with xarray (or indeed, they could just find this thread).

@efiring
Copy link
Member

efiring commented Sep 3, 2020

It looks to me like a few simple changes in our decorator would take care of the ufuncs, which comprise most of the package. I think the special-casing would for objects with `array_ufunc`` methods; if one is found in the argument list, we would just pass everything through to the underlying ufunc (or maybe still handle masked arrays first).

What we need to try this is a nice simple self-contained set of test cases that would easily show whether we are doing the right thing, with and without dask. @gmacgilchrist, can you supply such tests?

@gmacgilchrist
Copy link
Author

gmacgilchrist commented Sep 4, 2020

Great that you've implemented something that appears to handle this - thank you!

What might a test suite look like? I spent a bit of time this morning trying to reproduce, in a very simple configuration, some of the common errors that I was getting previously when passing xarray objects as arguments. Not sophisticated by any stretch, but it could provide some basis for a test suite? You can see the notebook here (I'm not sure why it is loading strangely at the moment, loads fine in nbviewer), I hope that's of some use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants