file_util: improve rmtree performance#1666
file_util: improve rmtree performance#1666nikitych wants to merge 2 commits intorpm-software-management:mainfrom
Conversation
Optimize `rmtree` to significantly reduce cleanup time, especially for large buildroots. The previous Python-based implementation caused substantial delays (e.g., ~13 minutes for a ~2M-file buildroot). Introduce a faster backend, using either `shutil.rmtree` or native `rm -r`, which reduces cleanup time to under one minute. Also add support for handling paths longer than PATH_MAX, but only for directories not listed in `exclude`. Extending this to excluded paths would overcomplicate the logic, and excluded-path handling was already nonfunctional and not in high demand. Some benchmark: $ sudo find /var/lib/mock/some-big-buildroot/ | wc -l 2056654 $ time mock -r some-big-buildroot --clean INFO: mock.py version 6.3 starting (python version = 3.9.21, NVR = mock-6.3-1.el9) previous rmtree implementaiont real 13m21.176s user 9m51.712s sys 2m29.270s shutil.rmtree as _fastRm real 1m8.450s user 0m14.331s sys 0m31.525s rm -r as _fastRm real 0m52.990s user 0m3.089s sys 0m29.184s Smaller buildroots also see noticeable improvements. $ mock -r fedora-43-x86_64 --init $ time mock -r fedora-43-x86_64 --clean real 0m2.219s user 0m1.726s sys 0m0.363s shutil.rmtree as _fastRm real 0m0.952s user 0m0.689s sys 0m0.178s rm -r as _fastRm real 0m0.879s user 0m0.627s sys 0m0.194s partially resolves: rpm-software-management#31
There was a problem hiding this comment.
vcs-diff-lint found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.
|
This is just too big to give it a prompt review, give me some time, please.
Can you document why it was slow? Where are we saving the syscalls, etc. |
|
Is this an AI generated change? This seems nonsensical, and the tests fail. |
|
The only change that makes sense to me is the replacement of os.listdir by os.scandir (the second one is much faster). But the second
|
Well, the idea was that the interpreter was spending too much time constructing If you comment it out, the performance of
Support for paths longer than PATH_MAX came as a side effect of using reference implementations of the deletion logic. Both I agree, even though To summarize:
What do you think, at which point should we stop? |
Thank you for the additional steps, very useful conclusion. If we start with a new issue first, and let people talk - I think we could get rid of the decorator once and forever (but +1 at least for removing it from the method that is being slowed down). Also, note there's an option to do |
Optimize
rmtreeto significantly reduce cleanup time, especially for largebuildroots. The previous Python-based implementation caused substantial delays
(e.g., ~13 minutes for a ~2M-file buildroot). Introduce a faster backend, using
either
shutil.rmtreeor nativerm -r, which reduces cleanup time to underone minute.
Also add support for handling paths longer than PATH_MAX, but only for
directories not listed in
exclude. Extending this to excluded paths wouldovercomplicate the logic, and excluded-path handling was already nonfunctional
and not in high demand.
Some benchmark:
$ sudo find /var/lib/mock/some-big-buildroot/ | wc -l
2056654
$ time mock -r some-big-buildroot --clean
INFO: mock.py version 6.3 starting (python version = 3.9.21, NVR = mock-6.3-1.el9)
previous rmtree implementaiont
real 13m21.176s
user 9m51.712s
sys 2m29.270s
shutil.rmtree as _fastRm
real 1m8.450s
user 0m14.331s
sys 0m31.525s
rm -r as _fastRm
real 0m52.990s
user 0m3.089s
sys 0m29.184s
partially fixes: #31