Skip to content

LZ4 in N5 vs Zarr #175

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jakirkham opened this issue Feb 24, 2019 · 7 comments
Open

LZ4 in N5 vs Zarr #175

jakirkham opened this issue Feb 24, 2019 · 7 comments

Comments

@jakirkham
Copy link
Member

It appears that LZ4 support in N5 differs from Zarr. Have not had a chance to dive deeply into it, but here is the gist.

N5 is using the lz4-java library here to compress chunks. This lz4-java library provides its own custom blocked format.

Zarr's Numcodecs library uses LZ4_compress_fast, which comes from the lz4 C library.

Encountered this issue with N5Store in PR ( zarr-developers/zarr-python#309 ). So disabled LZ4 support in N5Store for now. Not entirely sure how to bridge the gap between these two, but figured I'd raise this here for awareness and discussion.

@jakirkham
Copy link
Member Author

@jakirkham
Copy link
Member Author

Is there anything we still need to do on this one?

@axtimwalde
Copy link

Hi @jakirkham I forgot about this one. Would using LZ4FrameOutputStream in N5 work for zarr? We could introduce this as a parameter like in GzipCompression to switch between Gzip and Zlib and then there is at least some intersection?

@jakirkham
Copy link
Member Author

No worries. Me too. Thanks for looking into this. 🙂

I think so. We would have to test it on some data to be sure.

Sure that could be reasonable. I think we won't be able to reproduce the current Java blocked algorithm in Python, but as long as we have something in common we should be ok. Probably will need some documentation once it is all sorted out.

@alimanfoo
Copy link
Member

Hi folks, took a brief look into this, here's the options (I think)...

The current LZ4 codec in numcodecs does the simplest possible thing, which is to add a 4 byte header to store the length of the uncompressed data, then it compresses all the data in a single call to LZ4_compress_fast. So the output is 4 byte header + single block of compressed data.

The Java LZ4FrameOutputStream uses the LZ4 frame format, which has a different header + multiple blocks of compressed data + final checksum.

So option 1 would be that n5-java switches to use LZ4FrameOutputStream and we change numcodecs to also use the LZ4 frame format. (In numcodecs that would actually need to be implemented as a new codec, because it is a different format from the current "lz4" codec.)

Option 2 would be that n5-java switches to use the same encoding as the current numcodecs lz4 codec, i.e., 4 byte header plus single block of compressed data.

Both approaches are fine by me, just trying to lay out the options.

@mkitti
Copy link
Contributor

mkitti commented Sep 2, 2021

Is there still an outstanding issue here? We were discussing this at the OME-Zarr NGFF meeting.

@constantinpape
Copy link

I am pretty sure that is still a problem; lz4 is not supported in the zarr N5Store yet, see https://github.com/zarr-developers/zarr-python/blob/master/zarr/n5.py#L403-L469.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants