Description
Problem
It's not clear if this is a bug, a feature request, a bit of both, or not really either, but I felt like it'd be good to have an issue to discuss this in.
Cargo currently has a built-in assumption that the checksum of a given crate version will never change. This is, on the whole a good and helpful assumption. However, the fact that it is assumed and not checked leads to some peculiar behavior and workarounds if that assumption is ever violated (I'll give some reasons for why that may happen in the "Notes" section). I think it'd be valuable for Cargo to yell and scream if it detects a checksum mismatch, but then if explicitly told to adapt to the changed checksum, for Cargo to "do the right thing". Essentially this would be adding some "robustness in depth". That is not currently the case, as this sequence of operations highlights (I left it as a script so it's easier to replicate):
#!/bin/bash
# Fail if anything unexpected fails
set -euo pipefail
rm -rf cargo-src-replace
mkdir cargo-src-replace
cd cargo-src-replace
# Don't pollute our actual CARGO_HOME
export CARGO_HOME=$PWD/home/.cargo
# Create our own registry.
mkdir registry
git -C registry init
cat > registry/config.json <<EOF
{"dl":"file://$PWD/dl","api":"file://$PWD/api"}
EOF
git -C registry add .
git -C registry commit -m "Add config.json"
# Use that registry over crates.io
mkdir -p home/.cargo
cat > home/.cargo/config <<EOF
[source.crates-io]
replace-with = 'dummy-registry'
[source.dummy-registry]
registry = 'file://$PWD/registry'
EOF
# "Publish" bar-0.1.0
cargo new --lib bar
pushd bar
echo 'pub const VALUE: &str = "hello";' > src/lib.rs
cargo package --allow-dirty
popd
mkdir -p dl/bar/0.1.0
crate=bar/target/package/bar-0.1.0.crate
sha=$(sha256sum "$crate" | awk '{print $1}')
mv $crate dl/bar/0.1.0/download
mkdir -p registry/3/b/
cat > registry/3/b/bar <<EOF
{"cksum":"$sha","deps":[],"features":{},"links":null,"name":"bar","vers":"0.1.0","yanked":false}
EOF
git -C registry add .
git -C registry commit -m "Add bar for the first time"
# Make a crate "foo" that depends on bar-0.1.0
cargo new foo
pushd foo
echo 'bar = "0.1.0"' >> Cargo.toml
echo 'fn main() { println!("{}", bar::VALUE); }' > src/main.rs
# Sanity check that it runs as expected
cargo run | grep hello
popd
# Now make the registry hold a _different_ artifact for bar-0.1.0
pushd bar
# Make VALUE different so we can tell which version foo is actually using
echo 'pub const VALUE: &str = "world";' > src/lib.rs
cargo package --allow-dirty
popd
crate=bar/target/package/bar-0.1.0.crate
sha1="$sha"
sha=$(sha256sum "$crate" | awk '{print $1}')
mv $crate dl/bar/0.1.0/download
cat > registry/3/b/bar <<EOF
{"cksum":"$sha","deps":[],"features":{},"links":null,"name":"bar","vers":"0.1.0","yanked":false}
EOF
git -C registry add .
git -C registry commit -m "Add bar for the second time"
# Back to foo to see how it handles this registry change
pushd foo
# Running foo is still fine -- it hasn't pulled the registry update
cargo run | grep hello
# But a cargo update will reveal that something is off
if cargo update --dry-run; then
# NOTE: This construct (and the others like it) is so the script checks expected failures
echo "expected cargo update to fail with lock checksum mismatch"
exit 1
fi
# And now a run will fail too with the same checksum error
if cargo build; then
echo "expected foo build to fail with lock checksum mismatch"
exit 1
fi
# Time to venture into the land of dragons.
# What if we `cargo update -p bar` to tell Cargo to go ahead and update bar anyway?
# That (currently) doesn't work -- Cargo will not let you get past this without blowing away your Cargo.lock.
if cargo update -p bar; then
echo "cargo update -p bar now lets you override a checksum mismatch!"
exit 1
fi
# Okay, but what if we _really_ want this to work and manually fix Cargo.lock?
sed -i.bkp "s/$sha1/$sha/" Cargo.lock
# That "works" (in that Cargo builds again), but it's still using the old bar.
# This is... surprising, because the lockfile now indicates the new hash, but
# the _old_ source is clearly still being used!
if ! cargo run | grep hello; then
echo "cargo run now picks up new source after checksum override!"
exit 1
fi
# Let's try harder to make Cargo do the right thing.
# Clean the build artifacts! That'll do it.
cargo clean -p bar
# Except, no, because whil this will cause Cargo to recompile, it'll recompile
# based on the _cached_ source in $CARGO_HOME/src/, which is stale.
if ! cargo run | grep hello; then
echo "cargo run now picks up new source after checksum override + clean!"
exit 1
fi
# What if we try _even_ harder and clean the source cache too?
cargo clean -p bar
rm -r ../home/.cargo/registry/src
# Sadly, no, that's not enough either, since Cargo _also_ caches the .crate!
if ! cargo run | grep hello; then
echo "cargo run now picks up new source after checksum override + clean + rm src/!"
exit 1
fi
# Okay, so let's delete that too
# (https://github.com/rust-lang/cargo/pull/10070 would make this nicer)
cargo clean -p bar
rm -r ../home/.cargo/registry/src
rm -r ../home/.cargo/registry/cache
# Yay, finally!
cargo run | grep world
echo "SUCCESS?!"
Proposed Solution
I think the way to solve this is for Cargo to use the checksum along with the version anywhere the version is currently used to uniquely identify a crate version. That is, at least:
- As an input into fingerprints.
- In the filenames for
.crate
files in$CARGO_HOME/cache/
- In the filenames for extracted source directories in
$CARGO_HOME/src/
There may also be other places where the assumption that version implies checksum that I don't know of and that'll also have to be updated. If it's only the three above, this may be a relatively simple fix (🤞).
With these changes, if the lockfile is fixed, Cargo should correctly re-fetch and re-compile anything that needs to be, and thus produce the expected result without further fiddling.
It may also be helpful to allow cargo update -p
(maybe only with --precise
?) to fix a checksum mismatch rather than requiring an impacted user to blow away or manually edit their Cargo.lock
.
Notes
Okay, so, why would this come up in the first place? Aside from the corrupted lockfile case and situations like #7180 (comment), my use-case for this is a somewhat convoluted build setup in which a larger build system invokes Cargo. This larger build system may re-build a crate without human intervention if, say, a non-Cargo dependency changes, but such a re-build could cause that crate's checksum to change if it, say, runs bindgen
in a build script based on a C header file vended by those non-Cargo dependencies. The larger build system attempts to auto-generate Cargo versions so that such re-builds end up with separate version numbers to avoid exactly this problem, but it has no reliable and scalable way to ensure that the generated version number is distinct from all other generated version numbers. It could, in theory, pick random version numbers (in the patch position), or extract a u64 from the checksum, but that would lose out on the attractive "higher is newer" property of version numbers, and may lead to picking an older version of a dependency, which is obviously not good.
Now, this is not to say that my use-case is common, a supported/correct use of Cargo, or even at all sane, but I do think that it highlights a way in which Cargo can be made more robust to more use-cases than it currently is.