-
Notifications
You must be signed in to change notification settings - Fork 35
VCF lossless conversion tests are failing for Zarr 2.11.0 #828
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I looked into this a bit more and found that it's due to changes in Zarr 2.11.0. In that release, chunks with no data are not written to disk. However, we use two types of NaN value to encode missing vs fill (see https://github.com/pystatgen/vcf-zarr-spec/blob/main/vcf_zarr_spec.md#missing-and-fill-values) both of which are different to regular NaN. So the QUAL field, which is all missing in this example, is not written to disk and then is read as a regular NaN when read back. It's possible to use the old Zarr behaviour, by setting For the moment we should probably pin to |
We've hit this issue with tsinfer as well @tomwhite . For reference: |
Thanks for the references @jeromekelleher! |
Opened pydata/xarray#6347 |
Fixed in zarr-developers/zarr-python#834 |
From https://github.com/pystatgen/sgkit/runs/5401218690?check_suite_focus=true (which is truncated due to lots of output):
I haven't been able to reproduce this, but it looks like some fields are nan (e.g. QUAL) which is odd:
The text was updated successfully, but these errors were encountered: