Parallelization via dask

There are two places we could use xarray's machinery for parallelization to potentially speed up the generation of references.

1) Using `parallel=True` in `xr.open_mfdataset`, which would then use `dask.delayed` to parallelize the generation of the byte ranges from each file. This could be a big speedup, as it would parallelize the opening of the legacy files.

2) In theory we could also wrap the `ManifestArray` objects with `dask.Array`, then use dask's tree-reduce to do the concatenation. I think this is roughly what `kerchunk.combine.auto_dask` is approximating. However I'm not totally confident that (a) this is set up to work right now in dask.array or (b) this actually is a performance bottleneck in practice.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Parallelization via dask #7

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Parallelization via dask #7

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions