-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Pickle and .value vs. dask backend #902
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I agree about loading data into memory automatically -- this behavior made sense before we used dask in xarray, but now it doesn't really. We actually already have a I'm a little less certain about how to handle pickling data, because anytime you open a file from disk using |
I'm happy to look into this - could you point me in the right direction? |
This is where you can find the core caching logic on Variable objects: xarray/xarray/core/variable.py Lines 257 to 305 in 56abba5
Here's where we define Lines 305 to 327 in 56abba5
xarray/xarray/core/dataarray.py Lines 523 to 536 in 56abba5
As I mentioned before, let's add |
Working on it now. |
@crusaderky Let's just disable caching for dask. |
I'm done... I think. The result is less clean than I would have hoped - suggestions are welcome. |
Pickling a xarray.DataArray with dask backend will cause it to resolve the .data to a numpy array.
This is not desirable, as there's legitimate use cases where you may want to e.g. save a computation for later, or send it somewhere across the network.
Analogously, auto-converting a dask xarray to a numpy xarray as soon as you invoke the .value property is probably nice when you are working on a jupyter terminal, but not in a general purpose situation, particularly when xarray is used at the foundation of a very complex framework. Most of my headaches so far have been caused trying to figure out when, where and why the dask backend was replaced with numpy.
IMHO a module-wide switch to disable implicit dask->numpy conversion would be a nice solution.
A new method, compute(), could explicitly convert in place from dask to numpy.
The text was updated successfully, but these errors were encountered: