-
-
Notifications
You must be signed in to change notification settings - Fork 14.4k
checksum-freshness: Fix invalid checksum calculation for binary files #151137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
checksum-freshness: Fix invalid checksum calculation for binary files #151137
Conversation
This comment has been minimized.
This comment has been minimized.
abd65b1 to
1d5c70e
Compare
|
@rustbot reroll |
| // Build a list of files used to compile the output and | ||
| // write Makefile-compatible dependency rules | ||
| let mut files: Vec<(String, u64, Option<SourceFileHash>)> = sess | ||
| let mut files: IndexMap<String, (u64, Option<SourceFileHash>)> = sess |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe make this IndexMap<String, Option<(u64, Option<SourceFileHash>)>> where None indicates that the length and hash haven't been filled in yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that get us anything? This PR makes it so that files with binary contents are inserted into both source_map.files() (with null hash unnormalized_source_len equal to 0) and file_depinfo; with the use of IndexMap, we can override those coming from source_map.files() with hashes coming from file_depinfo. Thus, the value in that IndexMap would never be None.
Now is that the cleanest way? I'm sure not.. but I honestly don't understand implications of removing these "null" binary entries from the sourcemap well enough to make that cut.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes some amount of sense to keep them like this, as null entries? Especially if the has can never really be None as Piotr says. @bjorn3 does Piotr's comment convince you? Or d'ya have a good reason why making it an Option makes sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are first inserting into files with a dummy value of hash 0 + len 0 and later overwriting it with the real value, right? My suggestion is to first insert None and later overwrite it with the real value. This way you can unwrap the entry at use and be guaranteed that you didn't accidentally get a dummy value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point. What's tricky about it is that we have no way of knowing whether the items in files are "bugged" or not.
That is to say, in the context of that function, we have no way to tell if an entry with hash 0 + len 0 originated in a include_bytes!("my_1024mb_file.bin"); call (bugged one, that this PR tries to address) or include_str!("empty_file.txt") (legit one). Both of them have a hash 0 + len 0, and it is the correct value for the include_str one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can change line 639 to return None instead of (0, None), right? And adapt the return type of hash_iter_files accordingly. Or am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that branch (on line 639) is not relevant at all and that's not where we were getting our fake-nulls ((hash 0, len 0)) from.
Before this PR, the file path of a binary file (with non-UTF8 contents) was injected into depfile through source_map().files(). For non-utf8 files the values were not correct, since SourceMap only accepts Strings as the contents of a source file. See the following snippet, as that's where we "fake out" contents of a binary file when inserting it into a SourceMap:
rust/compiler/rustc_span/src/source_map.rs
Line 279 in 2850ca8
| let text = std::str::from_utf8(&bytes).unwrap_or("").to_string(); |
Crucially, we did not/do not call
hash_iter_files on these, as we assume that SourceMap calculated the length and a hash correctly.. Which it did not, in some cases (non-utf8 binary files). This then led to binary files having a hash of empty string and a length of 0.
With this PR, that's still the case - binary files are still present in source_map().files() result, but they're also injected into file_depinfo and that lets us call hash_iter_files on them and get the correct hash/file length. The change to files type let's us override the "faulty" value with a good one. However, as outlined in https://github.com/rust-lang/rust/pull/151137/files#r2726164490 , there's no straightforward way (IMHO) to use an option here, because if we knew which entries from the SourceMap were "faulty", we could just call hash_iter_files on them directly without jumping through the hoops.. And line 639 is not really at fault for what we're struggling with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. In that case I'm fine with leaving the code as is.
|
@rustbot reroll |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, if @bjorn3 's concern is lifted, r=me
| // Build a list of files used to compile the output and | ||
| // write Makefile-compatible dependency rules | ||
| let mut files: Vec<(String, u64, Option<SourceFileHash>)> = sess | ||
| let mut files: IndexMap<String, (u64, Option<SourceFileHash>)> = sess |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes some amount of sense to keep them like this, as null entries? Especially if the has can never really be None as Piotr says. @bjorn3 does Piotr's comment convince you? Or d'ya have a good reason why making it an Option makes sense
1d5c70e to
0df94dd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have addressed the outstanding review comments and am happy with the code as is; unless anybody objects, I think this PR is good to go (from my side).
|
@bors r+ |
…-binary-files, r=jdonszelmann checksum-freshness: Fix invalid checksum calculation for binary files Admittedly this is not the cleanest way to achieve this, but SourceMap is quite intertwined with source files being represented as Strings. Tracking issue: rust-lang/cargo#14136 Closes: rust-lang#151090
…uwer Rollup of 4 pull requests Successful merges: - #151137 (checksum-freshness: Fix invalid checksum calculation for binary files) - #148187 (Remove uses of `&mut CmResolver`) - #151626 (Remove `Deref<Target = TyCtxt>` from `QueryCtxt`) - #151661 (Suggest changing `iter`/`into_iter` when the other was meant)
This comment has been minimized.
This comment has been minimized.
…es, r=jdonszelmann checksum-freshness: Fix invalid checksum calculation for binary files Admittedly this is not the cleanest way to achieve this, but SourceMap is quite intertwined with source files being represented as Strings. Tracking issue: rust-lang/cargo#14136 Closes: #151090
|
💔 Test for 14ee0d1 failed: CI. Failed job:
|
|
@bors r- |
|
Commit 0df94dd has been unapproved. |
|
@bors r=jdonszelmann |
|
A job failed! Check out the build log: (web) (plain enhanced) (plain) Click to see the possible cause of the failure (guessed by this bot) |
…-binary-files, r=jdonszelmann checksum-freshness: Fix invalid checksum calculation for binary files Admittedly this is not the cleanest way to achieve this, but SourceMap is quite intertwined with source files being represented as Strings. Tracking issue: rust-lang/cargo#14136 Closes: rust-lang#151090
…-binary-files, r=jdonszelmann checksum-freshness: Fix invalid checksum calculation for binary files Admittedly this is not the cleanest way to achieve this, but SourceMap is quite intertwined with source files being represented as Strings. Tracking issue: rust-lang/cargo#14136 Closes: rust-lang#151090
…-binary-files, r=jdonszelmann checksum-freshness: Fix invalid checksum calculation for binary files Admittedly this is not the cleanest way to achieve this, but SourceMap is quite intertwined with source files being represented as Strings. Tracking issue: rust-lang/cargo#14136 Closes: rust-lang#151090
Rollup of 6 pull requests Successful merges: - #148718 (Do not mention `-Zmacro-backtrace` for std macros that are a wrapper around a compiler intrinsic) - #151137 (checksum-freshness: Fix invalid checksum calculation for binary files) - #150863 (Adds two new Tier 3 targets - `aarch64v8r-unknown-none{,-softfloat}`) - #151040 (Don't expose redundant information in `rustc_public`'s `LayoutShape`) - #151699 (Update books) - #151700 (os allow missing_docs)
…-binary-files, r=jdonszelmann checksum-freshness: Fix invalid checksum calculation for binary files Admittedly this is not the cleanest way to achieve this, but SourceMap is quite intertwined with source files being represented as Strings. Tracking issue: rust-lang/cargo#14136 Closes: rust-lang#151090
Rollup of 12 pull requests Successful merges: - #147996 (Stabilize ppc inline assembly) - #148718 (Do not mention `-Zmacro-backtrace` for std macros that are a wrapper around a compiler intrinsic) - #151137 (checksum-freshness: Fix invalid checksum calculation for binary files) - #151680 (Update backtrace and windows-bindgen) - #150863 (Adds two new Tier 3 targets - `aarch64v8r-unknown-none{,-softfloat}`) - #151040 (Don't expose redundant information in `rustc_public`'s `LayoutShape`) - #151383 (remove `#[deprecated]` from unstable & internal `SipHasher13` and `24` types) - #151529 (lint: Use rustc_apfloat for `overflowing_literals`, add f16 and f128) - #151669 (rename uN::{gather,scatter}_bits to uN::{extract,deposit}_bits) - #151689 (Fix broken Xtensa installation link) - #151699 (Update books) - #151700 (os allow missing_docs)
Admittedly this is not the cleanest way to achieve this, but SourceMap is quite intertwined with source files being represented as Strings.
Tracking issue: rust-lang/cargo#14136
Closes: #151090