Skip to content

Conversation

@mhansen
Copy link
Collaborator

@mhansen mhansen commented Dec 3, 2025

Fixes #116

I didn't have much success demonstrating this with the built-in benchmarks, but for a Google-Takeout-style 50GB zip archive, you can see the difference clearly. I'm not sure I should check in this benchmark, as it uses 150GB of disk space! Here's how I benchmarked:

Create a test zip file that’s like Takeout of Google Photos (1MB photos which are basically random data, 50GB archive, store uncompressed)

$ for f in (seq -f "%05g" 50000)
    dd if=/dev/random of=$f.random bs=1M count=1 &
  end
$ zip -0 takeout-50g-store.zip *.random

Then I ran some benchmarks on my 128 core Linux machine:

Unzip:

$ unzip -v
UnZip 6.00 of 20 April 2009, by Debian. Original by Info-ZIP.


$ time unzip takeout-50g-store.zip -d foo-store-unzip/

________________________________________________________
Executed in  309.17 secs    fish           external
   usr time  244.53 secs  890.00 micros  244.53 secs
   sys time   52.70 secs  482.00 micros   52.70 secs

Cores=(244.53+52.70)/309.17=0.96 cores

Old ripunzip release (2.0.3):

$ ~/.cargo/bin/ripunzip --version
ripunzip 2.0.3

$ time ~/.cargo/bin/ripunzip unzip-file takeout-50g-store.zip -d foo-store-ripunzip-old/

________________________________________________________
Executed in   60.44 secs    fish           external
   usr time   19.76 secs    0.00 millis   19.76 secs
   sys time  153.02 secs    1.39 millis  153.01 secs

Cores = (19.76+153.02)/60.44=2.86 cores

ripunzip with this change:

$ time ~/projects/ripunzip/target/release/ripunzip unzip-file takeout-50g-store.zip -d foo-store-ripunzip-new/

________________________________________________________
Executed in    8.71 secs    fish           external
   usr time    7.80 secs    1.28 millis    7.80 secs
   sys time  577.48 secs    0.00 millis  577.48 secs

Cores=(577.48+7.80)/8.71=67.2 cores

So it's come down from 60 seconds to 8 seconds, and we're now achieving high parallelism.

Disclosure: I generated this code change with Gemini-CLI. It needed some guiding at an architecture level about how to solve the problem with parallelism, but when it came to writing the code, I think it's a better Rust programmer than I am.

In my last commit, I ran clippy but not fmt, so the CI fmt job is now
failing.

This fixes the CI fmt job.
@mhansen mhansen changed the title Draft: unzip in parallel Unzip files in parallel Dec 4, 2025
@yangsharon-chromium yangsharon-chromium merged commit bb39a08 into GoogleChrome:main Dec 4, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ripunzip unzip-file is mostly serializing behind CloneableSeekableReader.inner Mutex

2 participants