Unzip files in parallel #118
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #116
I didn't have much success demonstrating this with the built-in benchmarks, but for a Google-Takeout-style 50GB zip archive, you can see the difference clearly. I'm not sure I should check in this benchmark, as it uses 150GB of disk space! Here's how I benchmarked:
Create a test zip file that’s like Takeout of Google Photos (1MB photos which are basically random data, 50GB archive, store uncompressed)
Then I ran some benchmarks on my 128 core Linux machine:
Unzip:
Cores=(244.53+52.70)/309.17=0.96 cores
Old ripunzip release (2.0.3):
Cores = (19.76+153.02)/60.44=2.86 cores
ripunzip with this change:
Cores=(577.48+7.80)/8.71=67.2 cores
So it's come down from 60 seconds to 8 seconds, and we're now achieving high parallelism.
Disclosure: I generated this code change with Gemini-CLI. It needed some guiding at an architecture level about how to solve the problem with parallelism, but when it came to writing the code, I think it's a better Rust programmer than I am.