-
-
Notifications
You must be signed in to change notification settings - Fork 329
How to detect missing chunks in a zarr array. #587
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @royerloic I recently ran into a similar issue, but the chunks were intended to be missing. Is that possible in your case? i.e. do you have chunks where all pixels are the fill value (e.g. 0)? See fsspec/filesystem_spec#342 for some background. If that is the case, then it's going to be more difficult to guarantee that there are no true negatives. I don't know of a flag to enforce writing empty chunks but you could try setting a different fill value. If that's not an issue, then this be roughly equivalent to #392 though perhaps a workaround can be found for your particular use case. |
Hey Loic, sorry to hear about the data loss. In addition to the longer term objectives Josh has highlighted, here are some things that could be tried today. First it's possible to look at the keys in the Second (building off of Josh's point above) it might be worth picking a fill value for Third (again building off Josh's point 😄) it's possible to use some simple checksumming algorithm as a filter. These pack the checksum in the data of each chunk. Fourth it's possible to compute a full checksum over a Zarr Array using Additionally it may be worthwhile to look at Zarr's convenience functions Also another thing worth exploring may be investigating different storage backends for different purposes. For instance writing to disk during acquisition, moving to a single file format (maybe None of these is the perfect solution. Though hopefully one or a few of these is useful in practical applications to improve quality. Questions and feedback welcome 🙂 |
Thanks @jakirkham and @joshmoore ! I think a solution along the lines of #392 would be great, In the meantime, I will experiment with some of the ideas Thanks! |
One question, here is the info for a freshly generated zarr file: Seems that not all chunks are initialised: Chunks initialized : 115856/116064 Is the only reason that a chunk is not initialised that something went wrong? or can it happen just because the whole chunk can be zeroes? |
Yes, definitely. |
Yeah this is why one of the suggestions above was to use a known problematic value (like would |
This is really a feature request hiding in a question:
I am facing the following issue: one of my large lightsheet microscopy datasets seems to have missing
chunks. These could have been 'lost' at any stage. I am not blaming zarr here for these, we know of other reasons
such as file transfer could have caused these problems. Ideally I would want to verify, after each processing step,
that all chunks have been written correctly -- or at least that no chunks are missing. How could this be done?
Also, some checksuming and data integrity features would be important as we use zarr more and more for critical
scientific data...
The text was updated successfully, but these errors were encountered: