Skip to content

allow you to raise error on missing zarr chunks with open_dataset/open_zarr #5197

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
bolliger32 opened this issue Apr 21, 2021 · 1 comment
Closed
Labels
plan to close May be closeable, needs more eyeballs topic-zarr Related to zarr storage library

Comments

@bolliger32
Copy link
Contributor

Is your feature request related to a problem? Please describe.
Currently if a zarr store has a missing chunk, it is treaded as all missing. This is an upstream functionality but one for which there may soon be a kwarg allowing you to instead raise an error in these instances (zarr-developers/zarr-python#489). This is valuable in situations where you would like to distinguish intentional NaN data from I/O errors that caused you to not write some chunks. Here's an example of a problematic case in this situation (courtesy of @delgadom ):

import xarray as xr
import numpy as np
xr.Dataset({'myarr': (('x', 'y'), [[0., np.nan], [2., 3.]]), 'x': [0, 1], 'y': [0, 1]}).chunk({'x': 1, 'y': 1}).to_zarr('myzarr.zarr');
print('\n\ndata read into xarray\n' + '-'*30)
print(xr.open_zarr('myzarr.zarr').compute().myarr)
print('\n\nstructure of zarr store\n' + '-'*30)
! ls -R myzarr.zarr
print('\n\nremove a chunk\n' + '-'*30 + '\nrm myzarr.zarr/myarr/1.0')
! rm myzarr.zarr/myarr/1.0
print('\n\ndata read into xarray\n' + '-'*30)
print(xr.open_zarr('myzarr.zarr').compute().myarr)

This prints:

data read into xarray
------------------------------
<xarray.DataArray 'myarr' (x: 2, y: 2)>
array([[ 0., nan],
       [ 2.,  3.]])
Coordinates:
  * x        (x) int64 0 1
  * y        (y) int64 0 1
structure of zarr store
------------------------------
myzarr.zarr:
myarr  x  y
myzarr.zarr/myarr:
0.0  0.1  1.0  1.1
myzarr.zarr/x:
0
myzarr.zarr/y:
0
remove a chunk
------------------------------
rm myzarr.zarr/myarr/1.0
data read into xarray
------------------------------
<xarray.DataArray 'myarr' (x: 2, y: 2)>
array([[ 0., nan],
       [nan,  3.]])
Coordinates:
  * x        (x) int64 0 1
  * y        (y) int64 0 1

Describe the solution you'd like
I'm not sure where a kwarg to the __init__ method of a zarr Array object would come into play within open_zarr or open_dataset (once zarr-developers/zarr-python#489 is merged), but I figured I'd ask this question to see if anyone could point me in the right direction and to get ready for when that zarr feature exists. Happy to file a PR once I know where I'm looking. Couldn't figure it out with some initial browsing

@dcherian dcherian added the topic-zarr Related to zarr storage library label Apr 21, 2021
@max-sixty
Copy link
Collaborator

This would be welcome! But I notice it's still not available upstream. So in an effort to reduce our issues, I'll plan to close this as a postponement until it's availble.

@max-sixty max-sixty added the plan to close May be closeable, needs more eyeballs label Nov 9, 2023
@max-sixty max-sixty closed this as not planned Won't fix, can't repro, duplicate, stale Nov 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
plan to close May be closeable, needs more eyeballs topic-zarr Related to zarr storage library
Projects
None yet
Development

No branches or pull requests

3 participants