Skip to content

Concurrent loading of coordinate arrays from Zarr #5092

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
shoyer opened this issue Mar 30, 2021 · 1 comment
Open

Concurrent loading of coordinate arrays from Zarr #5092

shoyer opened this issue Mar 30, 2021 · 1 comment
Labels
topic-backends topic-zarr Related to zarr storage library

Comments

@shoyer
Copy link
Member

shoyer commented Mar 30, 2021

When you open a dataset with Zarr, xarray loads coordinate arrays corresponding to indexes in serial. This can be slow (multiple seconds) even with only a handful of such arrays if they are stored in a remote filesystem (e.g., cloud object stores). This is similar to the use-cases for consolidated metadata.

In principle, we could speed up loading datasets from Zarr into Xarray significantly by reading the data corresponding to these arrays in parallel (e.g., in multiple threads).

@dcherian dcherian added topic-backends topic-zarr Related to zarr storage library labels Apr 19, 2021
@TomNicholas
Copy link
Member

TomNicholas commented May 21, 2025

See #8965 and #10326 for a more current discussion of this same idea. #10327 would be one way to close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-backends topic-zarr Related to zarr storage library
Projects
None yet
Development

No branches or pull requests

3 participants