-
-
Notifications
You must be signed in to change notification settings - Fork 328
Integer Scaling In Zarr 3 #2926
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Could you shore more details please? What's the original encoding of the netcdf files? How are you writing the Zarr store? What do you mean by "much more storage"? If you could provide a minimal reproducible example we will be able to help you much more effectively. As currently worded, your question is a bit vague. |
Hi really sorry for the lack of detail. Here goes I am working to turn the 100s thousands of GOES nc files into a zarr store for easier access and use for my research. Currently each file is simply quantized with a C16 Band on GOES18 for example
While there are workarounds for this, for example I could simply add the required metadata for xarray scale the data. I suspect that may not be ideal as it is possible that each file may possess unique scaling and offset values, it would be best to apply the quantization at the zarr storage level. My current pipeline ingests GOES data in geostationary projection, regrids it to rectilinear using
Main Processing Code
|
Yes you can definitely achieve this with Zarr! However, you should realize that such compression is lossy, i.e. it will introduce noise and reduce the effective precision of your data. You need to decide how important that is for your application vs. storage volume. With Xarray and Zarr, you could achieve this by manually setting I would not that the You could also look into PCodec, which does (lossless) wonders on floating point data. |
Will give this a try. Thanks! 🙏 |
I'm currently converting thousands of nc files into a single zarr store. However, the zarr store needs much more storage than the discrete nc files combined. I've encountered a similar issue on stackoverflow but this is in reference to Zarr 2. On disk, the arrays within the nc file are stored as int16 but the conversion works on the opened nc files which are float32. My question is, how can the stored arrays be further compressed? Is storage as int16 possible and during reading the arrays are uncompressed? I know zarr 3 has brought about relatively significant change compared to previous iterations so perhaps there are some points you can provide for more efficient compression and storage?
The text was updated successfully, but these errors were encountered: