-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Split execute_job
into execute_job_incr
and execute_job_non_incr
#109046
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
(rustbot has picked a reviewer for you, use r? to override) |
} | ||
let prof_timer = qcx.dep_context().profiler().query_provider(); | ||
let result = qcx.start_query(job_id, query.depth_limit(), None, || query.compute(qcx, key)); | ||
let dep_node_index = qcx.dep_context().dep_graph().next_virtual_depnode_index(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you encapsulate next_virtual_depnode_index
? The goal is to make sure that it is only called when DepGraph::data
is None
.
For instance, making DepGraph::Data
return some marker type DisabledDepGraph
on which we can call that method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to make a marker type struct DisabledDepGraph<'a, K: DepKind>(&'a DepGraph<K>)
, but it doesn't seem like LLVM was able to optimize it away:
Benchmark | Before | After | |
Time | Time | % | |
🟣 clap:check | 1.7102s | 1.7175s | 0.43% |
🟣 hyper:check | 0.2533s | 0.2538s | 0.23% |
🟣 regex:check | 0.9534s | 0.9567s | 0.35% |
🟣 syn:check | 1.5869s | 1.5971s | 0.64% |
🟣 syntex_syntax:check | 6.1097s | 6.1317s | 0.36% |
Total | 10.6134s | 10.6568s | 0.41% |
Summary | 1.0000s | 1.0040s | 0.40% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about DisabledDepGraph(Lrc<AtomicU32>)
, and have the dep-graph hand out references to that struct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds very performance equivalent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding assert!(self.data.is_none());
to DepGraph::next_virtual_depnode_index()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems to be cheaper:
Benchmark | Before | After | |
Time | Time | % | |
🟣 clap:check | 1.7151s | 1.7189s | 0.22% |
🟣 hyper:check | 0.2516s | 0.2525s | 0.34% |
🟣 regex:check | 0.9532s | 0.9538s | 0.06% |
🟣 syn:check | 1.5406s | 1.5400s | -0.04% |
🟣 syntex_syntax:check | 5.9116s | 5.9193s | 0.13% |
Total | 10.3722s | 10.3845s | 0.12% |
Summary | 1.0000s | 1.0014s | 0.14% |
There's a 0.01% code size increase so it probably doesn't get optimized away.
Could I get a perf run? I want to see if this affects bootstrap times before growing the PR larger. |
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
⌛ Trying commit 5bdc711b0c18afd0867e296085eafd4437bfcb9c with merge 8202e830070034d6387c18efbcdec4084be63938... |
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (8202e830070034d6387c18efbcdec4084be63938): comparison URL. Overall result: ✅ improvements - no action neededBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesThis benchmark run did not return any relevant results for this metric. |
The changes look good to me. |
Could not assign reviewer from: |
r=me with the assertion #109046 (comment) added. |
@bors r=cjgillot,michaelwoerister |
☀️ Test successful - checks-actions |
Finished benchmarking commit (822c10f): comparison URL. Overall result: ✅ improvements - no action needed@rustbot label: -perf-regression Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesThis benchmark run did not return any relevant results for this metric. |
Refactor `try_execute_query` This merges `JobOwner::try_start` into `try_execute_query`, removing `TryGetJob` in the processes. 3 new functions are extracted from `try_execute_query`: `execute_job`, `cycle_error` and `wait_for_query`. This makes the control flow a bit clearer and improves performance. Based on rust-lang#109046. <table><tr><td rowspan="2">Benchmark</td><td colspan="1"><b>Before</b></th><td colspan="2"><b>After</b></th></tr><tr><td align="right">Time</td><td align="right">Time</td><td align="right">%</th></tr><tr><td>🟣 <b>clap</b>:check</td><td align="right">1.7134s</td><td align="right">1.7061s</td><td align="right"> -0.43%</td></tr><tr><td>🟣 <b>hyper</b>:check</td><td align="right">0.2519s</td><td align="right">0.2510s</td><td align="right"> -0.35%</td></tr><tr><td>🟣 <b>regex</b>:check</td><td align="right">0.9517s</td><td align="right">0.9481s</td><td align="right"> -0.38%</td></tr><tr><td>🟣 <b>syn</b>:check</td><td align="right">1.5389s</td><td align="right">1.5338s</td><td align="right"> -0.33%</td></tr><tr><td>🟣 <b>syntex_syntax</b>:check</td><td align="right">5.9488s</td><td align="right">5.9258s</td><td align="right"> -0.39%</td></tr><tr><td>Total</td><td align="right">10.4048s</td><td align="right">10.3647s</td><td align="right"> -0.38%</td></tr><tr><td>Summary</td><td align="right">1.0000s</td><td align="right">0.9962s</td><td align="right"> -0.38%</td></tr></table> r? `@cjgillot`
Refactor `try_execute_query` This merges `JobOwner::try_start` into `try_execute_query`, removing `TryGetJob` in the processes. 3 new functions are extracted from `try_execute_query`: `execute_job`, `cycle_error` and `wait_for_query`. This makes the control flow a bit clearer and improves performance. Based on rust-lang/rust#109046. <table><tr><td rowspan="2">Benchmark</td><td colspan="1"><b>Before</b></th><td colspan="2"><b>After</b></th></tr><tr><td align="right">Time</td><td align="right">Time</td><td align="right">%</th></tr><tr><td>🟣 <b>clap</b>:check</td><td align="right">1.7134s</td><td align="right">1.7061s</td><td align="right"> -0.43%</td></tr><tr><td>🟣 <b>hyper</b>:check</td><td align="right">0.2519s</td><td align="right">0.2510s</td><td align="right"> -0.35%</td></tr><tr><td>🟣 <b>regex</b>:check</td><td align="right">0.9517s</td><td align="right">0.9481s</td><td align="right"> -0.38%</td></tr><tr><td>🟣 <b>syn</b>:check</td><td align="right">1.5389s</td><td align="right">1.5338s</td><td align="right"> -0.33%</td></tr><tr><td>🟣 <b>syntex_syntax</b>:check</td><td align="right">5.9488s</td><td align="right">5.9258s</td><td align="right"> -0.39%</td></tr><tr><td>Total</td><td align="right">10.4048s</td><td align="right">10.3647s</td><td align="right"> -0.38%</td></tr><tr><td>Summary</td><td align="right">1.0000s</td><td align="right">0.9962s</td><td align="right"> -0.38%</td></tr></table> r? `@cjgillot`
execute_job
was a bit large, so this splits it in 2. Performance was neutral locally, but this may affect bootstrap times.