-
Notifications
You must be signed in to change notification settings - Fork 35
Determine why so many dask graphs are executed during QC filtering #299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I started looking into this, and it seems that So The slightly altered form Ideally, Xarray would be able to defer evaluation, but I'm not sure how easy that would be to achieve, since it ultimately depends on what Dask can support. Dask can handle unknown chunk sizes, but with limited effect: you can't do another filtering operation, for example. You also can't do a rechunk operation - which we often want to do before saving the output of a filtering step (to Zarr) - this won't work since Dask can't rechunk arrays with unknown chunk sizes. (BTW Dask's suggested workaround in this situation is to call So perhaps we should see if there's a way to fix Xarray so it doesn't do the multiple re-evaluations? |
FYI I thought this was definitely worth trying to get an explanation for, so I filed pydata/xarray#4663. |
There are 38 computations run when evaluating this short pipeline so we need to better understand what Xarray is telling dask to do here:
The text was updated successfully, but these errors were encountered: