Conversation
5 tasks
zkochan
added a commit
that referenced
this pull request
Apr 24, 2026
Two perf fixes from investigations/pacquet-macos-perf.md that both collapse redundant passes over the tarball bytes. 1. Stream the HTTP body through the gzip decoder (doc fix #4). Replace `response.bytes().await` + `decompress_gzip` (sync zune-inflate) + separate `Integrity::check(&buffer)` with a streaming pipeline: `bytes_stream()` teed through `ssri::IntegrityChecker` into `async_compression::tokio::write:: GzipDecoder`. This removes one full pass over the compressed body (no separate `.check()`), never holds compressed + decompressed buffers at the same time, and lets decompression overlap with network awaits. Integrity is still verified end-to-end before any CAFS blob is written, so a mismatch aborts the install with the same `TarballError::Checksum` behavior as before. 2. Interleave SHA-512 with the per-entry read (doc fix #2). Add `StoreDir::write_cas_file_prehashed`; the tar extraction loop reads + hashes 64 KiB chunks in one pass instead of `read_to_end` followed by a separate `Sha512::digest(buffer)` inside `write_cas_file`. The buffer still has to exist (CAFS filename is hash-derived), but two passes over the decompressed entry bytes collapse into one. Mirrors pnpm's `parseTarball` + `addBufferToCafs` in `store/cafs/src/addFilesFromTarball.ts`. Why we diverge from pnpm on (1): upstream `fetching/tarball-fetcher/src/remoteTarballFetcher.ts` + `worker/src/start.ts` + `store/cafs/src/addFilesFromTarball.ts` buffers the whole tarball into a Node `SharedArrayBuffer` and runs `crypto.hash` + `zlib.gunzipSync` sequentially. Observable behavior here is identical — same integrity errors on mismatch, same decompressed bytes, same CAFS layout, same `index.db` rows — so per the "Internal performance divergence is allowed" clause added in the preceding commit, the streaming topology is a pacquet-specific implementation-detail optimization rather than a shape port. Dep changes: swap `zune-inflate` for `async-compression` (gzip + tokio features) + `futures-util`; enable reqwest's `stream` feature; add `sha2` to the tarball crate. Tests: existing `packages_under_orgs_should_work` (real-network end-to-end) and `should_throw_error_on_checksum_mismatch` both pass unchanged, proving the streaming pipeline produces the same decompressed bytes and fails integrity checks the same way.
zkochan
added a commit
that referenced
this pull request
Apr 24, 2026
Two perf fixes from investigations/pacquet-macos-perf.md that both collapse redundant passes over the tarball bytes. 1. Stream the HTTP body through the gzip decoder (doc fix #4). Replace `response.bytes().await` + `decompress_gzip` (sync zune-inflate) + separate `Integrity::check(&buffer)` with a streaming pipeline: `bytes_stream()` teed through `ssri::IntegrityChecker` into `async_compression::tokio::write:: GzipDecoder`. This removes one full pass over the compressed body (no separate `.check()`), never holds compressed + decompressed buffers at the same time, and lets decompression overlap with network awaits. Integrity is still verified end-to-end before any CAFS blob is written, so a mismatch aborts the install with the same `TarballError::Checksum` behavior as before. 2. Interleave SHA-512 with the per-entry read (doc fix #2). Add `StoreDir::write_cas_file_prehashed`; the tar extraction loop reads + hashes 64 KiB chunks in one pass instead of `read_to_end` followed by a separate `Sha512::digest(buffer)` inside `write_cas_file`. The buffer still has to exist (CAFS filename is hash-derived), but two passes over the decompressed entry bytes collapse into one. Mirrors pnpm's `parseTarball` + `addBufferToCafs` in `store/cafs/src/addFilesFromTarball.ts`. Why we diverge from pnpm on (1): upstream `fetching/tarball-fetcher/src/remoteTarballFetcher.ts` + `worker/src/start.ts` + `store/cafs/src/addFilesFromTarball.ts` buffers the whole tarball into a Node `SharedArrayBuffer` and runs `crypto.hash` + `zlib.gunzipSync` sequentially. Observable behavior here is identical — same integrity errors on mismatch, same decompressed bytes, same CAFS layout, same `index.db` rows — so per the "Internal performance divergence is allowed" clause added in the preceding commit, the streaming topology is a pacquet-specific implementation-detail optimization rather than a shape port. Dep changes: swap `zune-inflate` for `async-compression` (gzip + tokio features) + `futures-util`; enable reqwest's `stream` feature; add `sha2` to the tarball crate. Tests: existing `packages_under_orgs_should_work` (real-network end-to-end) and `should_throw_error_on_checksum_mismatch` both pass unchanged, proving the streaming pipeline produces the same decompressed bytes and fails integrity checks the same way.
zkochan
added a commit
that referenced
this pull request
Apr 25, 2026
- Move `configure_rayon_pool` to after `CliArgs::parse()`. clap's `--help` / `--version` paths exit inside `parse`, so they never reach the rayon pool init now — `pacquet --help` no longer spawns worker threads it'll throw away. Other subcommand paths still pay the (small) init cost; we can move it into the install path specifically as a follow-up if non-install commands are ever added that don't need rayon. - Drop the local `CasPathsArc` / `WarmEntry` aliases in `create_virtual_store::run`. They duplicated the value type of `pacquet_tarball::PrefetchedCasPaths`; if that map's value type changes, the local aliases would silently drift. Letting `warm`'s element type be inferred from its `push` calls keeps the binding to the upstream type. - Add three unit tests for `prefetch_cas_paths`: * `prefetch_cas_paths_returns_hits_for_live_index_rows` — happy path: index row written, CAFS blob present, verify on; result contains the expected file map. * `prefetch_cas_paths_omits_failed_integrity_entries` — index row whose digest matches no on-disk file; verify on; row is dropped from the result rather than surfacing a half-populated map. * `prefetch_cas_paths_skips_filesystem_checks_when_verify_disabled` — same shape as the failing-integrity test but with `verify_store_integrity = false`; the row surfaces (no fs checks), matching pnpm's `verify-store-integrity: false` semantics.
Draft
KSXGitHub
pushed a commit
that referenced
this pull request
Apr 30, 2026
#4 (cmd-shim — chmod target swallowed all errors): the post-write \`ensure_executable_bits\` call used \`let _ = ...\` which dropped \`PermissionDenied\`, \`EROFS\`, AppArmor deny and other real failures alongside the intentionally-ignored \`NotFound\`. Match \`NotFound\` explicitly and propagate everything else as \`LinkBinsError::Chmod\`, mirroring pnpm's \`fixBin\` ENOENT guard. Adds two DI-driven regression tests: - \`link_bins_propagates_target_chmod_error_via_di\` — \`PermissionDenied\` must surface as \`Chmod\`. - \`link_bins_swallows_target_chmod_not_found_via_di\` — \`NotFound\` must remain swallowed. #5 (package-manager — peer-resolved slot name parsing): \`find_slot_ own_package_dir\` used \`slot_name.rfind('@')\` to strip the version tail. That works for \`parent@1.0.0\` and \`@scope+parent@1.0.0\` but breaks for peer-resolved slots: \`to_virtual_store_name\` produces shapes like \`ts-node@10.9.1_@types+node@18.7.19_typescript@5.1.6\` where the last \`@\` is inside a peer's version, not at the package-name boundary. \`rfind\` then split inside the peer spec, the slot's own package wasn't found, and the slot was silently skipped — bins of children of every peer-resolved slot were never linked. Parse from the left instead, skipping a single leading \`@\` that belongs to a scoped package. The package-name half can never contain \`@\` after \`to_virtual_store_name\`'s scope normalization (\`/\` → \`+\`), so the first non-leading \`@\` is the right boundary. Adds \`link_virtual_store_bins_handles_peer_resolved_slot_name\` using the exact \`ts-node@10.9.1_@types+node@18.7.19_typescript@5.1.6\` shape verified by \`pacquet_lockfile::pkg_name_ver_peer::tests::to_virtual_ store_name\`. Both fixes verified by re-breaking the relevant code path and confirming the regression tests fail, then reverting. Drive-by from #9 (style nit on the chain split in \`SymlinkDirectDependencies::run\`): the chain was broken in the earlier \`feat(cmd-shim): port directories.bin discovery and Windows shim formats\` commit (dfdd33a) because \`direct_deps\` is consumed twice — once by \`symlink_direct_deps_into_node_modules\`, once by \`link_direct_dep_bins\`'s \`dep_names\` derivation. That can't sit inside a single chain. Documenting it here in lieu of amending the earlier commit. Test count: cmd-shim 70 → 72, package-manager (lib) 38 → 39.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.