perf: Parallelize turbo run pre-execution hot path#11958
Merged
anthonyshew merged 5 commits intomainfrom Feb 22, 2026
Merged
Conversation
…very Three targeted optimizations to the turbo run hot path: 1. Engine builder: Cache turbo.json chain per package and move the visited check before the expensive task_definition() call. The chain only depends on the package name, so multiple tasks in the same package reuse the cached result. 2. Task visitor: Defer env() computation to non-dry-run branches. The execution environment is unused during dry runs, avoiding per-task RwLock acquisition and env var map cloning. 3. find_untracked_files: Replace Mutex<Vec> with per-thread local buffers flushed via mpsc channel on drop, eliminating per-file mutex contention in the parallel walker.
Parallelize several sequential phases of turbo run's pre-execution pipeline: dependency resolution, turbo.json loading, and task summary construction. Also reduce per-call allocation overhead in the task hash tracker and gix index classification.
Contributor
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
…parallelize-hot-path # Conflicts: # crates/turborepo-scm/src/repo_index.rs
Contributor
Coverage Report
|
turbo run pre-execution hot path
github-actions Bot
added a commit
that referenced
this pull request
Feb 22, 2026
## Release v2.8.11-canary.20 Versioned docs: https://v2-8-11-canary-20.turborepo.dev ### Changes - perf: Optimize engine builder, task visitor, and untracked file discovery (#11956) (`e145bc6`) - release(turborepo): 2.8.11-canary.19 (#11957) (`be8c782`) - perf: Parallelize `turbo run` pre-execution hot path (#11958) (`b79b680`) --------- Co-authored-by: Turbobot <turbobot@vercel.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Parallelizes several sequential phases of
turbo run's pre-execution pipeline and reduces per-call allocation overhead. Individual functions show significant improvement in--profiletraces, though end-to-end wall-clock improvement is within noise on hyperfine benchmarks due to uninstrumented overhead (stdout serialization, daemon negotiation) dominating total runtime.Changes
Parallel
to_summarytask construction (tracker.rs)The loop that builds
TaskSummarystructs was sequential. On a large repo with ~1700 tasks this was ~92ms. Moved torayon::par_iter. Eachtask_summary()call is read-only on the engine, hash tracker (RwLockread), and package graph.Profile:
to_summary92ms → 10msParallel turbo.json preloading (
loader.rs,builder.rs)The engine builder loaded each package's
turbo.jsonlazily — sequentially on first access. Addedpreload_all()that reads and parses all package turbo.json files in parallel via rayon before the engine builder needs them. TheFixedMapcache usesOnceLockper key, so concurrent loads are non-blocking.Profile:
build_engine74ms → 43msParallel
connect_internal_dependencies(builder.rs,dep_splitter.rs)Dependencies::newcalls that resolve internal vs external deps were sequential. Each call is read-only on the workspaces map, so moved torayon::par_iter. Also hoistedpackage_manager.link_workspace_packages()(which reads a config file from disk for pnpm/Berry) above the parallel loop so it's computed once instead of N times.Profile:
connect_internal_dependencies52ms → 24msFaster hex encoding in gix index (
repo_index.rs)Replaced
e.id.to_hex().to_string()(goes throughHexDisplay→Display::fmt→ heapString) withhex::encode_to_sliceinto a stack buffer, skipping the intermediate allocation.Reduce
find_untracked_filesallocations (repo_index.rs)Eliminated the
Vec<String>+Arcthat cloned everyRepoStatusEntry.pathfor binary search.status_entriesis pre-sorted before the call, and walker threads binary search directly on the borrowed&[RepoStatusEntry]slice.Reduce
TaskHashTrackerper-call overhead (lib.rs)Changed
external_deps_hash_cachefromHashMap<PackageName, String>toHashMap<String, String>. Lookups usetask_id.package()directly instead of allocating aPackageNameviato_workspace_name()on everycalculate_task_hashcall.Measurement
Profile-based measurements (
turbo run build --dry=json --profile, 5-run median on a large ~1000 package repo) show the instrumented portion of the run dropping from ~761ms to ~663ms. However, hyperfine benchmarks on--dryruns across three repos of varying size show no statistically significant end-to-end improvement — results are 1.00-1.03× with error bars of ±0.09 to ±0.22.Testing
connect_internal_dependenciesverifying graph edges and external dependency classification are correct after parallelizationgit_index_regression_tests(31 tests) validate the hex encoding andfind_untracked_fileschangesOnceLockCAS, or read-onlyRwLockacquisition with no concurrent writers