Compress larger buffers #6286

jakirkham · 2022-05-05T19:36:30Z

Currently we disallow compressing buffers of a certain size

distributed/distributed/protocol/compression.py

Line 161 in 7bd6442

if len(payload) > 2**31: # Too large, compression libraries often fail

This tracks back to issue ( #366 ) and PR ( #367 ). AIUI this was added to workaround a Blosc issue, which is no longer supported ( #6027 ). Though LZ4 has a similar issue as was discovered in Numcodecs ( zarr-developers/numcodecs#81 ).

As noted in comment ( #6273 (comment) ), this may be due to the use of int32 for buffer sizes in compression algorithms. Not entirely sure why that is. Though it could be a technical or practical limitation (2GB is a pretty big buffer).

It might be worth investigating whether compressors still have this limitation and if so how we want to handle it. For example if it still exists, we could break large buffers up and compress smaller chunks to workaround this issue.

The text was updated successfully, but these errors were encountered:

jakirkham · 2022-05-05T20:17:50Z

In some cases at least we do frame splitting already

distributed/distributed/protocol/serialize.py

Lines 637 to 639 in 223c815

    
           header, frames = serialize_and_split(x, **kwargs) 
        
           if frames: 
        
               compression, frames = zip(*map(maybe_compress, frames))

Though maybe we should move this into serialize (if we are not already doing this somewhere and I'm just missing something)

jakirkham mentioned this issue May 5, 2022

Minimize copying in maybe_compress & byte_sample #6273

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compress larger buffers #6286

Compress larger buffers #6286

jakirkham commented May 5, 2022

jakirkham commented May 5, 2022

Compress larger buffers #6286

Compress larger buffers #6286

Comments

jakirkham commented May 5, 2022

jakirkham commented May 5, 2022