-
Notifications
You must be signed in to change notification settings - Fork 46
Closed
Labels
performancereferences generationReading byte ranges from archival filesReading byte ranges from archival filesxarrayRequires changes to xarray upstreamRequires changes to xarray upstream
Description
There are two places we could use xarray's machinery for parallelization to potentially speed up the generation of references.
-
Using
parallel=True
inxr.open_mfdataset
, which would then usedask.delayed
to parallelize the generation of the byte ranges from each file. This could be a big speedup, as it would parallelize the opening of the legacy files. -
In theory we could also wrap the
ManifestArray
objects withdask.Array
, then use dask's tree-reduce to do the concatenation. I think this is roughly whatkerchunk.combine.auto_dask
is approximating. However I'm not totally confident that (a) this is set up to work right now in dask.array or (b) this actually is a performance bottleneck in practice.
Metadata
Metadata
Assignees
Labels
performancereferences generationReading byte ranges from archival filesReading byte ranges from archival filesxarrayRequires changes to xarray upstreamRequires changes to xarray upstream