Skip to content

concat() with compat='no_conflicts' on dask arrays has accidentally quadratic runtime #5381

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
shoyer opened this issue May 26, 2021 · 0 comments
Labels
topic-combine combine/concat/merge topic-dask

Comments

@shoyer
Copy link
Member

shoyer commented May 26, 2021

This ends up calling fillna() in a loop inside xarray.core.merge.unique_variable(), something like:

  out = variables[0]
  for var in variables[1:]:
    out = out.fillna(var)

xarray/xarray/core/merge.py

Lines 147 to 149 in 55e5b5a

if combine_method:
for var in variables[1:]:
out = getattr(out, combine_method)(var)

This has quadratic behavior if the variables are stored in dask arrays (the dask graph gets one element larger after each loop iteration). This is OK for merge() (which typically only has two arguments) but is problematic for dealing with variables that shouldn't be concatenated inside concat(), which should be able to handle very long lists of arguments.

I encountered this because compat='no_conflicts' is the default for xarray.combine_nested().

I guess there's also the related issue which is that even if we produced the output dask graph by hand without a loop, it still wouldn't be easy to evaluate for a large number of elements. Ideally we would use some sort of tree-reduction to ensure the operation can be parallelized.

xref google/xarray-beam#13

@dcherian dcherian added the topic-combine combine/concat/merge label Jul 8, 2021
@dcherian dcherian added topic-dask Automation Github bots, testing workflows, release automation and removed Automation Github bots, testing workflows, release automation labels Apr 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-combine combine/concat/merge topic-dask
Projects
None yet
Development

No branches or pull requests

2 participants