Unzip files in parallel #118

mhansen · 2025-12-03T00:38:14Z

Fixes #116

I didn't have much success demonstrating this with the built-in benchmarks, but for a Google-Takeout-style 50GB zip archive, you can see the difference clearly. I'm not sure I should check in this benchmark, as it uses 150GB of disk space! Here's how I benchmarked:

Create a test zip file that’s like Takeout of Google Photos (1MB photos which are basically random data, 50GB archive, store uncompressed)

$ for f in (seq -f "%05g" 50000)
    dd if=/dev/random of=$f.random bs=1M count=1 &
  end
$ zip -0 takeout-50g-store.zip *.random

Then I ran some benchmarks on my 128 core Linux machine:

Unzip:

$ unzip -v
UnZip 6.00 of 20 April 2009, by Debian. Original by Info-ZIP.


$ time unzip takeout-50g-store.zip -d foo-store-unzip/

________________________________________________________
Executed in  309.17 secs    fish           external
   usr time  244.53 secs  890.00 micros  244.53 secs
   sys time   52.70 secs  482.00 micros   52.70 secs

Cores=(244.53+52.70)/309.17=0.96 cores

Old ripunzip release (2.0.3):

$ ~/.cargo/bin/ripunzip --version
ripunzip 2.0.3

$ time ~/.cargo/bin/ripunzip unzip-file takeout-50g-store.zip -d foo-store-ripunzip-old/

________________________________________________________
Executed in   60.44 secs    fish           external
   usr time   19.76 secs    0.00 millis   19.76 secs
   sys time  153.02 secs    1.39 millis  153.01 secs

Cores = (19.76+153.02)/60.44=2.86 cores

ripunzip with this change:

$ time ~/projects/ripunzip/target/release/ripunzip unzip-file takeout-50g-store.zip -d foo-store-ripunzip-new/

________________________________________________________
Executed in    8.71 secs    fish           external
   usr time    7.80 secs    1.28 millis    7.80 secs
   sys time  577.48 secs    0.00 millis  577.48 secs

Cores=(577.48+7.80)/8.71=67.2 cores

So it's come down from 60 seconds to 8 seconds, and we're now achieving high parallelism.

Disclosure: I generated this code change with Gemini-CLI. It needed some guiding at an architecture level about how to solve the problem with parallelism, but when it came to writing the code, I think it's a better Rust programmer than I am.

In my last commit, I ran clippy but not fmt, so the CI fmt job is now failing. This fixes the CI fmt job.

mhansen force-pushed the unzip branch 3 times, most recently from d1a2cbd to f096c45 Compare December 3, 2025 01:51

mhansen mentioned this pull request Dec 3, 2025

positioned_io draft to prevent expensive Arc<Mutex<File>> #34

Closed

Run cargo fmt

e50bbbd

In my last commit, I ran clippy but not fmt, so the CI fmt job is now failing. This fixes the CI fmt job.

mhansen force-pushed the unzip branch from f096c45 to 588e516 Compare December 3, 2025 22:10

unzip in parallel

b197bcd

mhansen force-pushed the unzip branch from 588e516 to b197bcd Compare December 3, 2025 22:10

mhansen changed the title ~~Draft: unzip in parallel~~ Unzip files in parallel Dec 4, 2025

yangsharon-chromium merged commit bb39a08 into GoogleChrome:main Dec 4, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unzip files in parallel #118

Unzip files in parallel #118

Uh oh!

mhansen commented Dec 3, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Unzip files in parallel #118

Unzip files in parallel #118

Uh oh!

Conversation

mhansen commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mhansen commented Dec 3, 2025 •

edited

Loading