-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
apply_ufunc(dask='parallelized') output_dtypes for datasets #1699
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Yes, I like this. Though it's worth considering whether the syntax should reverse the list/dict nesting, e.g., |
@shoyer that seems counter-intuitive for me - you are returning two datasets after all.
which would magically work both when x is a DataArray and when it's a Dataset |
I'm not sure about adding Anyways, I agree that |
The key thing is that for most people it would be extremely elegant and practical to be able to duck-type wrappers around numpy, scipy, and numba kernels that automagically work with Variable, DataArray, and Dataset (see my example above).
If you don't like |
I agree with the concern about duck typing, but my concern with Another option would be accept either objects with a dtype or dtypes in |
In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity If this issue remains relevant, please comment here or remove the |
still relevant |
When a Dataset has variables with different dtypes, there's no way to tell apply_ufunc that the same function applied to different variables will produce different dtypes:
Proposed solution
When the output is a dataset, apply_ufunc could accept either
output_dtypes=[t]
(if all output variables will have the same dtype) oroutput_dtypes=[{var1: t1, var2: t2, ...}]
. In the example above, it would beoutput_dtypes=[{'a': int, 'b': float}]
.The text was updated successfully, but these errors were encountered: