Commit 0f19b6d
committed
Fix pathological performance in trait solver cycles with errors
Fuchsia's Starnix system has had a multi-year long bug where
occasionally a typo could cause the rust compiler to take 10+ hours to
report an error. This was particularly hard to trace down since
Starnix's codebase is massive, over 384 thousand lines as of writing.
With the help of treereduce, cargo-minimize, and rustmerge, after about
a month of running we reduced it down to a couple [lines of code], which
only takes about 35 seconds to report an error on my machine. The bug
also appears to happen with `-Z next-solver=no` and `-Z
next-solver=coherence`, but does not occur with `-Z next-solver` or `-Z
next-solver=globally`.
I used Gemini to help diagnose the problem and proposed solution (which
is the one proposed in this patch):
1. The trait solver gets stuck in an exponential loop evaluating
auto-trait bounds (like Send and Sync) on cyclic types that contain
compilation errors (TyKind::Error).
2. Normally, if the solver detects a cycle, it prevents the result from
being stored in the Global Cache because the result depends on the
current evaluation stack. However, when an error is involved, the
depth tracking gets pinned to a low value, forcing the solver to rely
on the short-lived Provisional Cache. Since the provisional cache is
cleared between high-level iterations of the fulfillment loop, the
solver ends up re-discovering and re-evaluating the same large cycle
thousands of times.
3. Allow global caching of results even if they appear stack-dependent,
provided that the inference context is already "tainted by errors"
(`self.infcx.tainted_by_errors().is_some()`). This violates the
strict invariant that global cache entries shouldn't depend on the
stack, but it is safe because the compilation is already guaranteed
to fail due to the presence of errors. Prioritizing compiler
responsiveness and termination over perfect correctness in error
states is the correct trade-off here.
I added the reduction as the test case for this. However, I don't see an
easy way to catch if this bug comes back. Should we add some way to
timeout the test if it takes longer than 10 seconds to compile? That
could be a source of flakes though.
I don't have any experience with the trait solver code, but I did try to
review the code to the best of my ability. This approach seems a bit of
a bandaid to the solution, but I don't see a better solution. We could
try to teach the solver to not clear the provisional cache in this
circumstance, but I suspect that'd be a pretty invasive change.
I'm guessing if this does cause problems, it might report an incorrect
error, but I (and Gemini) were unable to come up with an example that
reported a different error with and without this fix.
[lines of code]: https://gist.github.com/erickt/255bc4006292cac88de906bd6bd9220a1 parent a5c825c commit 0f19b6d
3 files changed
Lines changed: 147 additions & 0 deletions
File tree
- compiler/rustc_trait_selection/src/traits/select
- tests/ui/traits
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1113 | 1113 | | |
1114 | 1114 | | |
1115 | 1115 | | |
| 1116 | + | |
| 1117 | + | |
| 1118 | + | |
| 1119 | + | |
| 1120 | + | |
| 1121 | + | |
| 1122 | + | |
| 1123 | + | |
1116 | 1124 | | |
1117 | 1125 | | |
1118 | 1126 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
0 commit comments