Relink don't rebuild: add a baseline, sound implementation that can be incrementally improved#155871
Relink don't rebuild: add a baseline, sound implementation that can be incrementally improved#155871susitsm wants to merge 15 commits intorust-lang:mainfrom
Conversation
|
Some changes occurred in compiler/rustc_attr_parsing cc @jdonszelmann, @JonathanBrouwer Some changes occurred in compiler/rustc_hir/src/attrs cc @jdonszelmann, @JonathanBrouwer Some changes occurred in compiler/rustc_passes/src/check_attr.rs |
|
rustbot has assigned @jdonszelmann. Use Why was this reviewer chosen?The reviewer was selected based on:
|
This comment has been minimized.
This comment has been minimized.
0e63248 to
4b676bd
Compare
|
This is a large change, might take me multiple review passes. I'll see if I can do the first today or tomorrow :) |
|
@rustbot author |
|
Reminder, once the PR becomes ready for a review, use |
| let mut public_api_hasher = PublicApiHasher::default(); | ||
| let tcx = self.tcx; | ||
| let mut stats: Vec<(&'static str, usize)> = Vec::with_capacity(32); | ||
|
|
There was a problem hiding this comment.
The encode_crate_deps below should record the public api hash rather than crate hash when doing rdr, right?
There was a problem hiding this comment.
Short answer: you are right, that will improve it while keeping it sound. I will add that to this PR.
Long answer: one of the main goals of RDR is to enable early cutoff, including the public hash of all dependencies goes against that. We should only include public hashes of dependencies we reexport in some way, or better, only include the hash of the part we reexport. Where reexport here can mean pub use, inlinable/generic/const eval mir reachable through local inlinable/generic/const eval mir and some more stuff we are not aware of yet. Doing this correctly and maintainably is likely the single most technically challenging part of getting RDR right.
There was a problem hiding this comment.
changed it to use public_api_hash, added some comments. 044529d
This comment has been minimized.
This comment has been minimized.
4b676bd to
1682f38
Compare
… rmeta without parents)
…end on public_api_hash instead of crate_hash
1682f38 to
ab0ab17
Compare
|
This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed. Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers. |
|
@rustbot review |
|
Just realized that there is already a soundness hole: the |
| // should be added here as | ||
| // ``` | ||
| // // FIXME do we need this // or a comment about why we need this | ||
| // let _ = my_new_field; |
There was a problem hiding this comment.
this part is outdated now, about the let _ =
| // `hash_crate_root_public_api`" into `encode_my_new_field` | ||
| // 3. Only remove/change what is hashed in a separate PR. Removing items just from the hash | ||
| // should be done with extreme scrutiny. A better way might be to sort the query result | ||
| // in its provider, or filter which values we encode. That also helps with rmeta size. |
There was a problem hiding this comment.
this should maybe be a fixme at the end
| } | ||
| } | ||
|
|
||
| // When changed, make sure to update the hashing in `hash_crate_root_public_api` |
There was a problem hiding this comment.
ideally, we of course made it so the hashing of that and the encoding here share some of the same source. Maybe by implementing a trait, or calling a method here that adds to the RDR hash, or returning a closure, idk. That way you don't have to think about all these comments and nonlocal subtleties
There was a problem hiding this comment.
that, or even a rustc lint at some point that automatically prompts you if you've made a change here and didn't update the other file. not 100% sure yet what the logic there would be, but this, though better, still seems brittle
|
@rustbot author |
| template!(List: &[r#"cfg = "...", crate_name = "...""#]), | ||
| |this, cx, args| { | ||
| this.items.extend(parse_rdr_fields(cx, args).map(|(cfg, crate_name)| { | ||
| (cx.attr_span, RDRFields { cfg, crate_name, changed: false }) |
There was a problem hiding this comment.
Am I misunderstanding something: rustc_public_hash_changed but change = false? Then the code that checks let green = !fields.changed; seems to be reversing this.
In other words, although a change has occurred, green is still set to true?
There was a problem hiding this comment.
Good catch! I likely messed this up when rebasing to main and converting to attribute parsers. There is also something wrong with running the tests. cpass revisions don't run them, they need at least bpass. Likely because I based it on the CGU reuse attributes, and inserted the runner code there.
There was a problem hiding this comment.
That's also exactly what I wanted to say. It is possible that none of the assertions in your test triggered a fatal error, yet the test passed silently. It's necessary to double check your test:>
There was a problem hiding this comment.
Fixed this. I also moved the attribute checking code near #[rustc_clean] checking and added a bfail revision to the test to make sure it fires.
| template!(List: &[r#"cfg = "...", crate_name = "...""#]), | ||
| |this, cx, args| { | ||
| this.items.extend(parse_rdr_fields(cx, args).map(|(cfg, crate_name)| { | ||
| (cx.attr_span, RDRFields { cfg, crate_name, changed: true }) |
| return; | ||
| } | ||
|
|
||
| let green = !fields.changed; |
There was a problem hiding this comment.
The reversal happened here.
This comment has been minimized.
This comment has been minimized.
…tributes for testing
…stc_public_hash_unchanged attributes
…when changing some rmeta encoder functions
ab0ab17 to
981dee4
Compare
|
Did an analysis on the usage of |
…nkov Clean up `TyCtxt::needs_crate_hash` usage and rename it to `needs_hir_hash`. While reviewing `crate_hash` query usage for rust-lang#155871, the `needs_crate_hash` function turned out to be the cause of unnecessary calls to the query. The `needs_crate_hash` name is easy to mistake for the functionality of "needs the crate_hash query". This PR removes the usage of `needs_crate_hash` where it was not appropriate and renames the function to `needs_hir_hash` which better reflects its functionality.
…nkov Clean up `TyCtxt::needs_crate_hash` usage and rename it to `needs_hir_hash`. While reviewing `crate_hash` query usage for rust-lang#155871, the `needs_crate_hash` function turned out to be the cause of unnecessary calls to the query. The `needs_crate_hash` name is easy to mistake for the functionality of "needs the crate_hash query". This PR removes the usage of `needs_crate_hash` where it was not appropriate and renames the function to `needs_hir_hash` which better reflects its functionality.
…nkov Clean up `TyCtxt::needs_crate_hash` usage and rename it to `needs_hir_hash`. While reviewing `crate_hash` query usage for rust-lang#155871, the `needs_crate_hash` function turned out to be the cause of unnecessary calls to the query. The `needs_crate_hash` name is easy to mistake for the functionality of "needs the crate_hash query". This PR removes the usage of `needs_crate_hash` where it was not appropriate and renames the function to `needs_hir_hash` which better reflects its functionality.
…nkov Clean up `TyCtxt::needs_crate_hash` usage and rename it to `needs_hir_hash`. While reviewing `crate_hash` query usage for rust-lang#155871, the `needs_crate_hash` function turned out to be the cause of unnecessary calls to the query. The `needs_crate_hash` name is easy to mistake for the functionality of "needs the crate_hash query". This PR removes the usage of `needs_crate_hash` where it was not appropriate and renames the function to `needs_hir_hash` which better reflects its functionality.
|
After diving into how the crate hash is used during dependency loading, my conclusion is, with the current resolver, it is not possible to fully skip rustc invocations for libraries where only private parts of upstream dependencies changed. The resolver uses the crate hash of the dependencies saved inside the metadata to load transitive dependencies. Leaving something out of this hash is unsound. I recommend reading the module level documentation of locator, but here is a short example showing the problem: The direct dependencies are located by name, but the transitive dependencies are located by name + crate hash of the transitive dependencies loaded from the metadata of dep1/dep2. This allows using different versions of crates by different dependencies. Here the hash is used to uniquely identify the crate, which needs private parts. The solution(s)Public api hash includes the stable crate id, it should be fine (for now)
so it should be fine while this is not stabilized, the amount of miscompilations should be zero. Ofc, this is only true with battle tested build systems like cargo, which use Save paths + public hash instead of full hashesWe save the path of where to look for a dependency and use that with the public hash to look up dependencies. This could improve resolver speed, while sacrificing some portability of rmetas (not sure of the real world use case where this would cause a problem.) This would also allow users to replace a dependency with a different one that has the same public api hash, then just relink. Which kind of clicks with the spirit of relink-dont-rebuild. Add some kind of fast path that only updates the crate and dependency hashes when public hash did not changeThe upside to this is that we don't have to touch cargo to make the RDR feature good (would still be an improvement, but a lot less important.) All other tries for this feature included changes to cargo, to skip rustc invocations. This would need to be handled in all alternate build systems as well. If we add this fast path, there is no need to do that. Downside is that I have no idea how hard this is, but seems complicated. I'm leaning towards save paths + public hash instead of full hashes. Any input is appreciated here. |
…nkov Clean up `TyCtxt::needs_crate_hash` usage and rename it to `needs_hir_hash`. While reviewing `crate_hash` query usage for rust-lang#155871, the `needs_crate_hash` function turned out to be the cause of unnecessary calls to the query. The `needs_crate_hash` name is easy to mistake for the functionality of "needs the crate_hash query". This PR removes the usage of `needs_crate_hash` where it was not appropriate and renames the function to `needs_hir_hash` which better reflects its functionality.
…nkov Clean up `TyCtxt::needs_crate_hash` usage and rename it to `needs_hir_hash`. While reviewing `crate_hash` query usage for rust-lang#155871, the `needs_crate_hash` function turned out to be the cause of unnecessary calls to the query. The `needs_crate_hash` name is easy to mistake for the functionality of "needs the crate_hash query". This PR removes the usage of `needs_crate_hash` where it was not appropriate and renames the function to `needs_hir_hash` which better reflects its functionality.
View all comments
This PR implemenets a sound but not too useful implementation of relink don't rebuild with the unstable
-Z public-api-hashflag. It currently uses the stable hash of all items in the metadata.The goal is to give a base implementation that can be used for experimentation. It can be incrementally improved over time by removing or stable sorting items. The PR also adds new test attributes
rustc_public_hash_changedandrustc_public_hash_unchangedand an example test using them.What are non-goals for this PR: a useful, optimized implementation of RDR and public api hashing. That has many more challenges which will each require careful review.
A non exhaustive list of the challenges for the feature I ran into while trying to make the hash useful (this should probably go in a tracking issue, but I'm not aware of one)
my_crate::funcprintsprivate functionas the errorcargo check), could use a much simpler hash than codegen, it does not need private types from mir (or any mir at all?)