-
Notifications
You must be signed in to change notification settings - Fork 97
LZ4 in N5 vs Zarr #175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Is there anything we still need to do on this one? |
Hi @jakirkham I forgot about this one. Would using |
No worries. Me too. Thanks for looking into this. 🙂 I think so. We would have to test it on some data to be sure. Sure that could be reasonable. I think we won't be able to reproduce the current Java blocked algorithm in Python, but as long as we have something in common we should be ok. Probably will need some documentation once it is all sorted out. |
Hi folks, took a brief look into this, here's the options (I think)... The current LZ4 codec in numcodecs does the simplest possible thing, which is to add a 4 byte header to store the length of the uncompressed data, then it compresses all the data in a single call to LZ4_compress_fast. So the output is 4 byte header + single block of compressed data. The Java LZ4FrameOutputStream uses the LZ4 frame format, which has a different header + multiple blocks of compressed data + final checksum. So option 1 would be that n5-java switches to use LZ4FrameOutputStream and we change numcodecs to also use the LZ4 frame format. (In numcodecs that would actually need to be implemented as a new codec, because it is a different format from the current "lz4" codec.) Option 2 would be that n5-java switches to use the same encoding as the current numcodecs lz4 codec, i.e., 4 byte header plus single block of compressed data. Both approaches are fine by me, just trying to lay out the options. |
Is there still an outstanding issue here? We were discussing this at the OME-Zarr NGFF meeting. |
I am pretty sure that is still a problem; lz4 is not supported in the zarr |
It appears that LZ4 support in N5 differs from Zarr. Have not had a chance to dive deeply into it, but here is the gist.
N5 is using the lz4-java library here to compress chunks. This lz4-java library provides its own custom blocked format.
Zarr's Numcodecs library uses
LZ4_compress_fast
, which comes from the lz4 C library.Encountered this issue with
N5Store
in PR ( zarr-developers/zarr-python#309 ). So disabled LZ4 support inN5Store
for now. Not entirely sure how to bridge the gap between these two, but figured I'd raise this here for awareness and discussion.The text was updated successfully, but these errors were encountered: