Skip to content

Compress larger buffers #6286

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jakirkham opened this issue May 5, 2022 · 1 comment
Open

Compress larger buffers #6286

jakirkham opened this issue May 5, 2022 · 1 comment

Comments

@jakirkham
Copy link
Member

Currently we disallow compressing buffers of a certain size

if len(payload) > 2**31: # Too large, compression libraries often fail

This tracks back to issue ( #366 ) and PR ( #367 ). AIUI this was added to workaround a Blosc issue, which is no longer supported ( #6027 ). Though LZ4 has a similar issue as was discovered in Numcodecs ( zarr-developers/numcodecs#81 ).

As noted in comment ( #6273 (comment) ), this may be due to the use of int32 for buffer sizes in compression algorithms. Not entirely sure why that is. Though it could be a technical or practical limitation (2GB is a pretty big buffer).

It might be worth investigating whether compressors still have this limitation and if so how we want to handle it. For example if it still exists, we could break large buffers up and compress smaller chunks to workaround this issue.

@jakirkham
Copy link
Member Author

In some cases at least we do frame splitting already

header, frames = serialize_and_split(x, **kwargs)
if frames:
compression, frames = zip(*map(maybe_compress, frames))

Though maybe we should move this into serialize (if we are not already doing this somewhere and I'm just missing something)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant