Support for lz4 compression #163 #168

gnzsnz · 2025-03-07T20:31:11Z

changes on __init__.py
update README.md
update tests
update ci.yml
add lz4 to dependencies
benchmark with and without io.BufferedWriter -> remove io.BufferedWriter for lz4
benchmark python lz4 vs lz4 CLI
set lz4 compression levels aligned with python lz4 [0-16], as CLi accepts those values without problem.

- tox.ini to include lz4 tests - pyproject.toml to include dev dependencies - .github/workflows/ci.yml to include lz4 tests - tests to include lz4 tests

gnzsnz · 2025-03-08T10:53:10Z

Hi,

I added a few lines of code to support lz4, I did my best to follow the existing structure. Test are passing, except for pypy. But i don't think that pypy issues are related to lz4.

Please let me know if I should do any change.

rhpvorderman · 2025-03-10T07:33:57Z

src/xopen/__init__.py

@@ -808,7 +853,7 @@ def xopen(  # noqa: C901
    compresslevel is the compression level for writing to gzip, xz and zst files.
    This parameter is ignored for the other compression formats.
    If set to None, a default depending on the format is used:
-    gzip: 6, xz: 6, zstd: 3.


Note to self: didn't we change the gzip level to 1?

Yes we did.

pyproject.toml

rhpvorderman

Looks very good. My only question is to benchmark whether a bufferedwriter really adds value when returning an lz4 writable file.

My other comment is that it is probably best to make lz4 non-optional, but that needs @marcelm 's blessing is well.

pyproject.toml

src/xopen/__init__.py

rhpvorderman · 2025-03-10T07:41:57Z

src/xopen/__init__.py

+    f = lz4.frame.LZ4FrameFile(filename, mode, compression_level=compresslevel)
+    if "r" in mode:
+        return f
+    # Buffer writes on lz4.open to mitigate overhead of small writes


For Gzip this overhead is present becase gzip is written in Python. Did you benchmark this to check if it made a differences for small writes?

no, i just follow what other compression formats where doing.

for small writes we could use dictionaries for zstd and lz4, which should boost performance for small writes

Gzip and Bzip2 write calls are implemented in python. They have massive overhead. If the object you are writing to is implemented in C, that usually is not the case. I recommend benchmarking whether the BufferedWriter helps.

or small writes we could use dictionaries for zstd and lz4, which should boost performance for small writes

Xopen does not support that. Gzip can also use dictionaries, but xopen does not provide the handles for that. That is more suited for low level libraries.

gnzsnz · 2025-03-10T09:42:39Z

I forgot to mention this earlier.

python-lz4 has compression level 0-16, while CLI lz4 has compression levels 1-12. If you have faced this on other algorithm we can re-use an existing solution, if not we can think a way to scale up/down compression levels.

I have set default 1, as works for both. But the value entered and the underlying compression backend need to be aligned, which is not obvious.

rhpvorderman · 2025-03-10T10:40:39Z

On debian 11 lz4 is not that picky. -0 works and even -1231 works (I assume anything in between works too). --1 does not work. So technically you could use range(0, sys.maxsize) as an allowed range specifier. (Please do not use the tuple call in that case, python can check if something is in a range without the whole array being created).

gnzsnz · 2025-03-10T20:59:37Z

Benchmark description

Benchmark for files from 1Kb to 100Mb, 100 runs per test. Using random data (compression factor is bad for this scenario), using default compression level.

Benchmarks are not using xopen except for bufferedWriter, i'm comparing raw python LZ4 vs OS lz4 CLI

For python i'm measuring read using with open from python standar lib plus LZ4 open to write(and viceversa). This is to compare with CLI lz4 that does the same operation.

Compression LZ4 python vs LZ4 CLI

Testing sizes: ['1.0KB', '10.0KB', '100.0KB', '1.0MB', '10.0MB', '100.0MB']
Runs per test: 100

Results Compression (times in seconds):

Size	Python LZ4 (avg ± std)	OS LZ4 (avg ± std)
1.0KB	0.000222 ± 0.000135	0.002831 ± 0.000895
10.0KB	0.000153 ± 0.000044	0.002343 ± 0.000167
100.0KB	0.000140 ± 0.000072	0.002443 ± 0.000308
1.0MB	0.000696 ± 0.000833	0.003184 ± 0.000342
10.0MB	0.006929 ± 0.002354	0.008113 ± 0.000971
100.0MB	0.084589 ± 0.057809	0.056928 ± 0.062384

Note: Lower times are better. Results show mean ± standard deviation.

Python is faster up to 10MB (included), then CLI is better for big files at 100MB.

Decompression LZ4 python vs LZ4 CLI

Testing sizes: ['1.0KB', '10.0KB', '100.0KB', '1.0MB', '10.0MB', '100.0MB']
Runs per test: 100

Results Decompression (times in seconds):

Size	Python LZ4 (avg ± std)	OS LZ4 (avg ± std)
1.0KB	0.103372 ± 0.002786	0.051380 ± 0.045575
10.0KB	0.000817 ± 0.000073	0.005209 ± 0.016507
100.0KB	0.000167 ± 0.000027	0.002811 ± 0.000130
1.0MB	0.000096 ± 0.000019	0.002655 ± 0.000044
10.0MB	0.010892 ± 0.000622	0.008884 ± 0.003092
100.0MB	0.000113 ± 0.000028	0.002750 ± 0.000164

Note: Lower times are better. Results show mean ± standard deviation.

Mixed results here, CLI is faster for small files (1Kb), then python lz4 is faster. surprizingly 1Mb and 100Mb are really fast with python lz4, we might be hitting some block size sweet spot there. Overall for decompression python is better that CLI

xopen benchmark with and without `io.BufferedWriter`

Testing sizes: ['1.0KB', '10.0KB', '100.0KB', '1.0MB', '10.0MB', '100.0MB']
Runs per test: 100

Results Compression (times in seconds):

Size	xopen (avg ± std)	xopn no buffer (avg ± std)
1.0KB	0.000191 ± 0.000079	0.000105 ± 0.000020
10.0KB	0.000164 ± 0.000049	0.000168 ± 0.000130
100.0KB	0.000216 ± 0.000092	0.000157 ± 0.000015
1.0MB	0.000719 ± 0.000501	0.000596 ± 0.000468
10.0MB	0.006572 ± 0.002621	0.005854 ± 0.001389
100.0MB	0.074405 ± 0.018944	0.074006 ± 0.020324

Note: Lower times are better. Results show mean ± standard deviation.

There are benefits on not using io.BufferedWriter across different file sizes.

Conclusion

i'm taking out io.BufferedWriter for lz4 as it's better to take it out. let me know i can do the same for zstd.
Overall is better to use python lz4, but there might still be a case for using OS lz4 CLI. I'm leaving the code as it is.
benchmark is in a ipynb file, which i can't upload here, https://www.dropbox.com/scl/fi/ftdm12snz4x5rawsdqvix/lz4_bench.ipynb?rlkey=0ayph5s4qiovgal9nlsc4c5lb&st=si0o7cic&dl=0

let me know your comments

rhpvorderman · 2025-03-11T07:14:50Z

Compression LZ4 python vs LZ4 CLI
Python is faster up to 10MB (included), then CLI is better for big files at 100MB.

The CLI solution consists of the python process pushing data into the pipe and the LZ4 CLI processing that, effectively using 2 cores. This is using 1 additional thread.
The python lz4 is not given this benefit. What happens when python lz4 is opened with an additional thread (threads=2 or whatever syntax they use)? That is the fair comparison.

Decompression LZ4 python vs LZ4 CLI
Mixed results here, CLI is faster for small files (1Kb), then python lz4 is faster. surprizingly 1Mb and 100Mb are really fast with python lz4, we might be hitting some block size sweet spot there. Overall for decompression python is better that CLI

The way you are benchmarking with 100 iterations is really sensitive to temporary moments when the CPU is busy doing other things, I think that explains the lacking 10 MB result. Except for 1kb, I see that for decompression Python LZ4 is always better. Given the large stddev on the 1KB result, I think that is from an outlier too.

My conclusion is that python lz4 is always faster. The decompression speed for lz4 is so enormously high (4GB/s and higher if the README is to be believed) that all we are doing is measuring the overhead of getting the data into a file. Using a pipe to another process is much more inefficient than doing it directly.

There are benefits on not using io.BufferedWriter across different file sizes.

Great! Thanks for benchmarking that.

It looks like always going for python lz4 is the best option except for maybe compression. Can you compare the 2threads result with the piped result (using 1 thread on lz4) to have a more apples to apples comparison?

gnzsnz · 2025-03-11T12:23:20Z

The CLI solution consists of the python process pushing data into the pipe and the LZ4 CLI processing that, effectively using 2 cores. This is using 1 additional thread.
The python lz4 is not given this benefit. What happens when python lz4 is opened with an additional thread (threads=2 or whatever syntax they use)? That is the fair comparison.

The way i setup the benchmark is to use default settings for LZ4 CLI, which means number of threads = auto. There is no way to set the number of threads in python LZ4, it's multithreading, but there is no parameter to set it (https://python-lz4.readthedocs.io/en/stable/lz4.frame.html). 🤷

So, I don't think this test is possible. At least, I don't know how to do it. But i might be missing something.

I agree with your conclusion, python LZ4 is the best for most of the scenarios. And when it's not the best, it's within the statistical error.

gnzsnz · 2025-03-11T12:27:23Z

I just realize i can share a jupyter notebook usign gist

https://gist.github.com/gnzsnz/048f8c2749f0c73716c69138c157adea

rhpvorderman · 2025-03-11T14:25:25Z

So, I don't think this test is possible. At least, I don't know how to do it. But i might be missing something.

I agree with your conclusion, python LZ4 is the best for most of the scenarios. And when it's not the best, it's within the statistical error.

Ah, but if python-lz4 does not support a threaded mode, then it is better to use the pipedcompressionprogram class instead for scenarios where threading is requested.

Python LZ4 drops the GIL, but if the threading library is not used to launch another thread, all the computation still happens on the same thread. So there is no speedup there. Xopen bypasses this with CLI programs. Python-isal is the exception here, because that contains quite some work to be able to escape the GIL and use true threading in Python.

So I think the current PR is almost ready. Simply make sure python-lz4 is not optional but required and all the tests pass, and then I think it can be merged.

…epts those values without problem.

gnzsnz · 2025-03-13T09:02:09Z

Regarding tests. All tests are passing on my pc, pytest and tox work fine. tox is failing with pypy but what fails is coverage not the test themselves. mypy is failing on tox, but these are old outstanding issues. Nothing new introduced by this PR.

I have no clue why CI is failing. I would need your support there. I have done minimal changes on ci.yml to install LZ4 cli on OSX and ubuntu runners.

rhpvorderman

I just checked the code if I could spot anything. But nothing yet. I will have to run this on my own PC to check why the tests are failing when I have the time.

README.rst

cedricdonie · 2025-03-15T21:05:32Z

Hi everyone, I came across this PR by chance and was interested in the discussion. If I understand it correctly, it would always be faster to use python-lz4 for decompression despite it being single-threaded. However, the current work in this PR seems to always call the piped compression program for decompression when threading is requested.

Python LZ4 drops the GIL, but if the threading library is not used to launch another thread, all the computation still happens on the same thread. So there is no speedup there. Xopen bypasses this with CLI programs. Python-isal is the exception here, because that contains quite some work to be able to escape the GIL and use true threading in Python.

The above comment suggests that in theory, launching a subprocess should be faster because it can use a different process.

So, I don't think this test is possible. At least, I don't know how to do it. But i might be missing something.

I agree with your conclusion, python LZ4 is the best for most of the scenarios. And when it's not the best, it's within the statistical error.

Ah, but if python-lz4 does not support a threaded mode, then it is better to use the pipedcompressionprogram class instead for scenarios where threading is requested.

Your interpretation seems to be that python is faster despite python using a single thread while the CLI used threads=auto. If we now use pipedcompressionprogram when threading is requested, won't we slow down the decompression?

PS: I believe that sns.stripplots might be useful to analyze benchmarks with potential outliers. Here is a little notebook with an example: https://gist.github.com/cedricdonie/6785314197cee9a539b1936e22b2b982.
PPS: Thanks for the awesome library and for adding LZ4 🙂!

rhpvorderman · 2025-03-17T07:12:54Z

The above comment suggests that in theory, launching a subprocess should be faster because it can use a different process.

There are two definitions of "faster"

Uses less wallclock time. This metric favours heavy threading.
Uses less cpu time overall. This metric favours the least possible amount of overhead.

By using a subprocess, all the decompression is in one process. The python process only has to read input data from the pipe. It can then use the rest of the time to actually run the program you are interested in. This will decrease wall clocktime.

Your interpretation seems to be that python is faster despite python using a single thread while the CLI used threads=auto. If we now use pipedcompressionprogram when threading is requested, won't we slow down the decompression?

LZ4 is a bit of a special case because the decompression is extremely fast. It could be that simply decompressing a block of data in LZ4 is much faster than incurring the overhead cost of a pipe while letting another program decompress it. So yes, that is a valid concern.

gnzsnz · 2025-03-21T19:31:31Z

@rhpvorderman did you manage to find why CI is failing?

src/xopen/__init__.py

marcelm

Awesome, thanks! I finally got around to reviewing this PR. I just noted a couple of minor things that I think should be changed, but I will let @rhpvorderman make the final decision.

src/xopen/__init__.py

tests/test_xopen.py

pyproject.toml

.gitignore

gnzsnz · 2025-03-27T12:28:06Z

Now pypy is failing

tests/test_xopen.py::test_roundtrip[.lz4-None-t] PASSED                  [ 34%]
Fatal Python error: Aborted

Stack (most recent call first, approximate line numbers):
  File "/home/runner/work/xopen/xopen/.tox/py/lib/pypy3.9/site-packages/coverage/pytracer.py", line 146 in _trace
tests/test_xopen.py::test_roundtrip[.lz4-0-b] py: exit -6 (17.65 seconds) /home/runner/work/xopen/xopen> coverage run --branch --source=xopen,tests -m pytest -v --doctest-modules tests pid=2861
  py: FAIL code -6 (45.58=setup[27.93]+cmd[17.65] seconds)
  evaluation failed :( (45.98 seconds)
Error: Process completed with exit code 250.

but the error is coming from coverage, not really from the lz4 changes. i would appreciate your inputs, i assume this is not new.

rhpvorderman · 2025-03-28T08:11:56Z

python-lz4 is not tested and build for PyPy. So this is a problem with the upstream library or PyPy.

I am wondering what the best way is to work around this problem. Make lz4 optional again? Just drop the low quality bindings and only use the external program?

gnzsnz · 2025-03-31T13:53:07Z

python-lz4 is not tested and build for PyPy. So this is a problem with the upstream library or PyPy.

I am wondering what the best way is to work around this problem. Make lz4 optional again? Just drop the low quality bindings and only use the external program?

Please let me know how to move forward, I see the following options:

leave code as it is. it will not work with pypy, unless lz4 CLI is installed and explicitly called, xopen("file.txt.xz", mode="wb", threads=1). Not sure if this is acceptable, if yes I would suggest making it clear in the README.md
make lz4 as optional dependence, with the same caveats as on the previous point.
if I understand correctly your comment, you suggest dropping lz4 module and use lz4 CLI.

Personally, I think that lz4 as optional dependency is the most flexible approach. but is your call, please let me know.

marcelm · 2025-04-01T13:18:02Z

Personally, I think that lz4 as optional dependency is the most flexible approach. but is your call, please let me know.

Yes, but we could use an environment marker to require lz4 only if we’re not on PyPy. It would look like this in pyproject.toml:

  'lz4>4.3.1'; platform_python_implementation != "PyPy"',

The code is a bit more complicated because we need to handle the case that import lz4 fails, but that is already implemented AFAICS.

On PyPy, we then need to fall back to the lz4 CLI.

A couple of observations regarding the CLI:

Multithreading in lz4 is only supported for compression, not for decompression.
Older versions of lz4 (such as v1.94 that comes with Ubuntu 24.04) do not support multithreading. They fail if one tries to pass the -T option.
The newer versions use a default of -T0, which chooses the number of threads automatically (presumably the number of available cores).

So assuming that we do not want to try to detect which version of lz4 is installed, we cannot use -T. We can therefore not actually control the number of threads used for compression; it will just depend on which version of lz4 is installed.

Hm, what about this behavior:

Mode	Threads	What to use	Comment
Compress	0	python-lz4
Compress	!=0	lz4 CLI	threads parameter is ignored, but if it is a newer binary, it will use all cores
Decompress	0	python-lz4
Decompress	!= 0	python-lz4	threads parameter is ignored

And also fall back to lz4 CLI if python-lz4 is not available.

So this is essentially if lz4 is not None and (mode == 'rb' or (mode in ('ab', 'wb') and threads == 0)): use Python bindings; else: use lz4 CLI

What do you think?

rhpvorderman · 2025-04-01T13:57:20Z

And also fall back to lz4 CLI if python-lz4 is not available.

Also if threads was explicitly set to 0? I am in dubio about that one. Crashing because a single-threaded option can not be provided leads to the user not being able to open the file. Defaulting to the CLI will use more threads than expected, which can cause issues on cluster environments.

marcelm · 2025-04-02T09:16:15Z

Also if threads was explicitly set to 0? I am in dubio about that one. Crashing because a single-threaded option can not be provided leads to the user not being able to open the file. Defaulting to the CLI will use more threads than expected, which can cause issues on cluster environments.

Hm, you’re right.

In general, I would say that as an xopen user, my main interest is in being able to open files that may be compressed, not caring so much about the details. Sure I can specify compression level and threads, but this is more on a “best effort” level. For example, if we open a bz2 file with threads > 0 but pbzip2 is not available, we open it using bz2.open anyway (singlethreaded).

Applied to lz4, I would say we should not fail even when the bindings aren’t available and threads == 0.

How about this instead of version detection (which I’d like to avoid): We try to run the lz4 CLI with the appropriate -T option first, and if that fails, we just re-try without the option.

Pseudocode

if lz4 is not None and (mode == 'rb' or (mode in ('ab', 'wb') and threads == 0)):
    # use Python bindings
else:
    try:
        run lz4 with option -T max(threads, 1)
    except ...:
        run lz4 without option -T

rhpvorderman · 2025-04-02T12:19:08Z

Fully agree on all your points. That seems to be the best option.

gnzsnz · 2025-04-07T09:56:52Z

Status

lz4 dependency, but not for pypy on pyproject. 'lz4>4.3.1'; platform_python_implementation != "PyPy"'
lz4 dependency management. The previous point makes tox fail for pypy. we need to consider a logic where lz4 is still treated as an optional dependency. this was part of the initial commit. i need search the commit history and apply it.
logic to maximize changes of lz4 success Support for lz4 compression #163 #168 (comment)
there is a new error on test_xopen.py::test_pass_bytesio_for_reading_and_writing for params [lz4-1]. error io.UnsupportedOperation: fileno. After my initial analysis is clear that BytesIO will not work for process pipes, as there is no functional fileno method. My understanding is that this is testing threads on a python module. for lz4 it works with threads == 0 but when threads == 1 it tries to use LZ4 CLI which fails. i can add a check on the code or maybe an xfail marker. Need to look into it.

let me know your comments

gnzsnz added 3 commits March 7, 2025 21:26

Support for lz4 compression pycompression#163

34efbb2

updates:

00de271

- tox.ini to include lz4 tests - pyproject.toml to include dev dependencies - .github/workflows/ci.yml to include lz4 tests - tests to include lz4 tests

update README.rst with lz4 changes

7ab5809

rhpvorderman reviewed Mar 10, 2025

View reviewed changes

pyproject.toml Outdated Show resolved Hide resolved

rhpvorderman requested changes Mar 10, 2025

View reviewed changes

move lz4 from optional dependency to dependency

bcd85b4

remove BufferWriter for lz4

0d1a114

gnzsnz added 4 commits March 11, 2025 17:11

update tox to have lz4 as dependencie

64e55de

set lz4 version to >=4.3.3 (minimun for py3.8)

8e474a8

remove lz4 as optional dependency from test_xopen.py

bf32deb

set lz4 compression levels aligned with python lz4 [0-16], as CLi acc…

9376298

…epts those values without problem.

rhpvorderman requested changes Mar 14, 2025

View reviewed changes

README.rst Outdated Show resolved Hide resolved

README.rst Outdated Show resolved Hide resolved

rhpvorderman reviewed Mar 25, 2025

View reviewed changes

src/xopen/__init__.py Outdated Show resolved Hide resolved

marcelm approved these changes Mar 25, 2025

View reviewed changes

src/xopen/__init__.py Outdated Show resolved Hide resolved

tests/test_xopen.py Show resolved Hide resolved

tests/test_xopen.py Outdated Show resolved Hide resolved

pyproject.toml Outdated Show resolved Hide resolved

.gitignore Outdated Show resolved Hide resolved

resolve comments from PR

d3992d9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for lz4 compression #163 #168

Support for lz4 compression #163 #168

gnzsnz commented Mar 7, 2025 •

edited

Loading

gnzsnz commented Mar 8, 2025

rhpvorderman Mar 10, 2025

rhpvorderman Mar 10, 2025

rhpvorderman left a comment

rhpvorderman Mar 10, 2025

gnzsnz Mar 10, 2025

rhpvorderman Mar 10, 2025

gnzsnz commented Mar 10, 2025

rhpvorderman commented Mar 10, 2025

gnzsnz commented Mar 10, 2025

rhpvorderman commented Mar 11, 2025

gnzsnz commented Mar 11, 2025

gnzsnz commented Mar 11, 2025

rhpvorderman commented Mar 11, 2025

gnzsnz commented Mar 13, 2025

rhpvorderman left a comment

cedricdonie commented Mar 15, 2025

rhpvorderman commented Mar 17, 2025

gnzsnz commented Mar 21, 2025

marcelm left a comment

gnzsnz commented Mar 27, 2025

rhpvorderman commented Mar 28, 2025

gnzsnz commented Mar 31, 2025

marcelm commented Apr 1, 2025

rhpvorderman commented Apr 1, 2025

marcelm commented Apr 2, 2025

rhpvorderman commented Apr 2, 2025

gnzsnz commented Apr 7, 2025

Support for lz4 compression #163 #168

Are you sure you want to change the base?

Support for lz4 compression #163 #168

Conversation

gnzsnz commented Mar 7, 2025 • edited Loading

gnzsnz commented Mar 8, 2025

rhpvorderman Mar 10, 2025

Choose a reason for hiding this comment

rhpvorderman Mar 10, 2025

Choose a reason for hiding this comment

rhpvorderman left a comment

Choose a reason for hiding this comment

rhpvorderman Mar 10, 2025

Choose a reason for hiding this comment

gnzsnz Mar 10, 2025

Choose a reason for hiding this comment

rhpvorderman Mar 10, 2025

Choose a reason for hiding this comment

gnzsnz commented Mar 10, 2025

rhpvorderman commented Mar 10, 2025

gnzsnz commented Mar 10, 2025

Benchmark description

Compression LZ4 python vs LZ4 CLI

Decompression LZ4 python vs LZ4 CLI

xopen benchmark with and without io.BufferedWriter

Conclusion

rhpvorderman commented Mar 11, 2025

gnzsnz commented Mar 11, 2025

gnzsnz commented Mar 11, 2025

rhpvorderman commented Mar 11, 2025

gnzsnz commented Mar 13, 2025

rhpvorderman left a comment

Choose a reason for hiding this comment

cedricdonie commented Mar 15, 2025

rhpvorderman commented Mar 17, 2025

gnzsnz commented Mar 21, 2025

marcelm left a comment

Choose a reason for hiding this comment

gnzsnz commented Mar 27, 2025

rhpvorderman commented Mar 28, 2025

gnzsnz commented Mar 31, 2025

marcelm commented Apr 1, 2025

rhpvorderman commented Apr 1, 2025

marcelm commented Apr 2, 2025

rhpvorderman commented Apr 2, 2025

gnzsnz commented Apr 7, 2025

gnzsnz commented Mar 7, 2025 •

edited

Loading

xopen benchmark with and without `io.BufferedWriter`