Skip to content

Add deleted_crates database table #9912

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Nov 18, 2024
Merged

Conversation

Turbo87
Copy link
Member

@Turbo87 Turbo87 commented Nov 12, 2024

This table will be used to record when a crate was deleted, (optionally) why it was deleted, who deleted it, and when it will be available for re-creation again. I've kept the database dump visibility as private for now. We can decide later whether to make it public or not, and what fields to include.

The delete-crate admin tool is adjusted to fill the new table, optionally recording --deleted-by and --message parameters. Note that the admin tool defaults to available_at = deleted_at for now, to match the current behavior. This can be adjusted in a dedicated pull request if we decide to do so.

The publish endpoint is adjusted to check the new table for availability of the crate name before accepting the publish.

This PR should address #9904 (comment) by making it possible for the crate deletion endpoint to set available_at = deleted_at + 24 hours or whatever other value we choose (including 0 hours). It also addresses a couple of requests/comments over the years to track deletions in a more structured way than adding a comment to a Zulip thread.

@Turbo87 Turbo87 added C-enhancement ✨ Category: Adding new behavior or a change to the way an existing feature works A-backend ⚙️ labels Nov 12, 2024
@Turbo87 Turbo87 requested a review from a team November 12, 2024 14:57
Copy link

codecov bot commented Nov 12, 2024

Codecov Report

Attention: Patch coverage is 58.49057% with 44 lines in your changes missing coverage. Please review.

Project coverage is 89.06%. Comparing base (d092b69) to head (31f08b3).
Report is 69 commits behind head on main.

Files with missing lines Patch % Lines
src/bin/crates-admin/delete_crate.rs 0.00% 33 Missing ⚠️
src/models/user.rs 0.00% 11 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #9912      +/-   ##
==========================================
- Coverage   89.16%   89.06%   -0.10%     
==========================================
  Files         292      294       +2     
  Lines       30316    30419     +103     
==========================================
+ Hits        27031    27094      +63     
- Misses       3285     3325      +40     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Turbo87 Turbo87 force-pushed the deleted-crates-table branch from a11076b to eb45233 Compare November 12, 2024 15:10
@LawnGnome
Copy link
Contributor

@walterhpearce and I had a discussion this morning, and he's going to write it up in #9904, but we're probably going to need to track the versions in deleted crates as well. So let's hold this for now pending that write up, please.

@Turbo87
Copy link
Member Author

Turbo87 commented Nov 13, 2024

I don't think #9904 is the right place for this though. We already perform deletions through the admin tool and that PR is just enabling users to do it by themselves.

he's going to write it up

when? the RFC has been merged 2.5 months ago. would've been good to get input back then when it was still open...

@walterhpearce
Copy link

The main concern I discussed with @LawnGnome is around immediate takeover of a crate for malicious purposes.

The threat scenario is the event of a author deleting a crate, and an attacker publishing the same name and version, or the same name and a semver-compatible version.

In both of these cases, what mechanisms exist to notify the user that this is occuring? Does Cargo.lock checksums prevent the immediate update scenario? What about updates occuring for a semver-compat update?

I think it is an open question on where the responsibility for this lies - the open question was whether crates should enforce such a version-prevention mechanisms (force a major version bump), or should a flag be present for cargo to notify the user an author change occured?

My gut says option #2, the user notification, is the ideal scenario here.

@Turbo87
Copy link
Member Author

Turbo87 commented Nov 13, 2024

Does Cargo.lock checksums prevent the immediate update scenario?

yes, unless there is a bug in cargo that skips the checksum checks that should be the case

What about updates occuring for a semver-compat update?

the crate deletion RFC gives 4 different conditions that need to be met for users to be able to self-delete crates:

  • the crate is less than 3 days old (makes it unlikely for a large number of users to already rely on it, and, if so, makes it unlikely for the maintainer to not realize the impact and still delete it)
  • or the crate has less than X downloads, only one maintainer, and no (public) reverse dependencies

the combination of these IMHO makes this a somewhat unlikely scenario.

the open question was whether crates should enforce such a version-prevention mechanisms (force a major version bump)

the same could be said about crate transfers, which we already allow. but for the common use case of "I named my crate actix_web, but realized it should have been actix-web" it would be a bit unfortunate for the author to not be able to republish their existing versions. I guess it could be a limit only if a different author uses the name, but I'm not sure it is really worth the complexity.

should a flag be present for cargo to notify the user an author change occured?

I'm open to something like that, but that needs design work and a) I don't know enough about cargo to be able to do that and b) I don't think we should block the deletion tracking on an implementation on the cargo side, since that would likely delay the RFC implementation by months.

@Turbo87 Turbo87 requested a review from LawnGnome November 15, 2024 12:33
@eth3lbert
Copy link
Contributor

eth3lbert commented Nov 15, 2024

I noticed that the deleted_by field could be set to any value. Could this be an issue if someone were to fill it with an unrelated user?

Should we consider the delete action as a VersionOwnerAction?

@Turbo87
Copy link
Member Author

Turbo87 commented Nov 15, 2024

I noticed that the deleted_by field could be set to any value. Could this be an issue if someone were to fill it with an unrelated user?

considering that currently only people with Heroku admin access can use the tool and that those have database access too, I think it's safe enough 😉

Should we consider the delete action as a VersionOwnerAction?

VersionOwnerAction is something that happens per version, but after a delete operation there is no crate anymore, which also deletes the versions, so there is nothing that could be filled in the version_id column of that table.

@Turbo87 Turbo87 force-pushed the deleted-crates-table branch from eb45233 to 31f08b3 Compare November 15, 2024 16:08
@LawnGnome
Copy link
Contributor

(This is kind of a combo comment for this and #9904, but since the most recent conversation is here, I'm going to put it here.)

I think we're agreed that the answer to the open RFC question around deleted crate name reuse is "allow reuse, with some minimal restrictions", which this PR implements. I'm still grappling with the question around versions, though.

Given that, I set up a couple of test cases to see what would happen, assuming normal use of lockfiles by a user.1 Basic set up:

  1. Publish2 a crate (say, totally-safe-crate) at 0.1.0.
  2. Create a crate (say test-case) that depends on totally-safe-crate 0.1.0.3
  3. Delete totally-safe-crate using crates-admin delete-crate totally-safe-crate.
  4. Publish a new totally-safe-crate with a different, but semver compatible version (0.1.1, in this case).
  5. Remove the target directory and try to build test-case again.

The good news: without any changes, a lockfile does the expected:

$ cargo run
    Updating `local` index
error: failed to select a version for the requirement `totally-safe-crate = "^0.1.0"` (locked to 0.1.0)
candidate versions found which didn't match: 0.1.1
location searched: `local` index
required by package `test-case v0.1.0 (/home/adam/trees/rust-foundation/scratch/deleted-crate-test/test-case)`

The less good news: a simple cargo update works without any warnings:


$ cargo update
    Updating `local` index
     Locking 1 package to latest compatible version
    Updating totally-safe-crate v0.1.0 (registry `local`) -> v0.1.1
$ cargo run
   Compiling totally-safe-crate v0.1.1 (registry `local`)
   Compiling test-case v0.1.0 (/home/adam/trees/rust-foundation/scratch/deleted-crate-test/test-case)
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 1.26s
     Running `target/debug/test-case`
l33t h4x0r stuffs
For completeness, publishing a crate with the same version

Publishing a crate with the exact same version and trying to rebuild test-case fails with a checksum error, as you'd expect:

$ cargo run
error: checksum for `totally-safe-crate v0.1.0 (registry `local`)` changed between lock files

this could be indicative of a few possible errors:

    * the lock file is corrupt
    * a replacement source in use (e.g., a mirror) returned a different checksum
    * the source itself may be corrupt in one way or another

unable to verify that `totally-safe-crate v0.1.0 (registry `local`)` is the same as when the lockfile was generated

To be clear: I don't think cargo is doing anything wrong here. (Yes, it might be nice if some sort of change-of-crate-ownership/identity detection was added to cargo — and doing that might allow for a version restriction to eventually be relaxed in the far future — but exactly what that would look like would need design work.)

But I think the cargo update case is the kind of thing that users are going to do innocently enough that I think we should also adapt the NPM rule of not allowing reused crate versions by prohibiting versions that are semver-compatible once we open deletions up to regular users.

So, in summary: I'm happy to merge this as-is, (pending a proper review momentarily, but I had a quick look and I think it's fine at first glance), but I think we also need a deleted_versions table and an appropriate check in the publish endpoint before we merge #9904. (I'm happy to take that on if you're over this, @Turbo87.)4

Footnotes

  1. Put another way, I didn't touch test-case/Cargo.lock outside of running cargo run and cargo update as described.

  2. I did this with a local crates.io instance, for completeness.

  3. The details of this crate shouldn't really matter, but basically I made it a lib crate with one function that returns a string, and then the replacement crate returns a different string; test-case just prints the string.

  4. And, yes, this is the sort of thing I should have thought more about during the RFC period. I'm sorry; that's on me.

@Turbo87
Copy link
Member Author

Turbo87 commented Nov 18, 2024

I set up a couple of test cases to see what would happen

yeah, that matches what I expected

I think we also need a deleted_versions table and an appropriate check in the publish endpoint

given the constraints/requirements that we have set for being able to delete a crate I'm not sure whether we actually need to require this. (see #9912 (comment))

I'm happy to take that on

if we were to implement this we might want to have a quick chat about the design and what exactly we are trying to solve here. if this is about "don't let the new user publish something semver compatible" then something like a min_version column on the deleted_crates table might be sufficient. if this is about "don't let the user publish a version that already existed in the past" then a deleted_versions table might indeed be the solution, though then we might not be able to use a crate_id column and would have to use crate_name instead.

I'm happy to merge this as-is

thanks for the review :)

@Turbo87 Turbo87 merged commit 9c7b5e7 into rust-lang:main Nov 18, 2024
10 checks passed
@Turbo87 Turbo87 deleted the deleted-crates-table branch November 18, 2024 08:39
@LawnGnome
Copy link
Contributor

given the constraints/requirements that we have set for being able to delete a crate I'm not sure whether we actually need to require this. (see #9912 (comment))

That's true — the reverse dependency and download checks do (greatly) reduce the risk here. While they're good heuristics to avoid general ecosystem disruption, I can imagine corner cases where this might still be a problem, though (large company uses a crate that literally nobody else uses, it gets deleted, someone is sloppy on the internal code review when pulling a new version in).

if we were to implement this we might want to have a quick chat about the design and what exactly we are trying to solve here. if this is about "don't let the new user publish something semver compatible" then something like a min_version column on the deleted_crates table might be sufficient. if this is about "don't let the user publish a version that already existed in the past" then a deleted_versions table might indeed be the solution, though then we might not be able to use a crate_id column and would have to use crate_name instead.

I'm more concerned about the semver-compatible case, since an actual version match should trigger the checksum failure noted above.

A min_version would be sufficient, but the reason I'm wondering if we actually want to track all versions is the case where:

  1. Original author publishes versions 0.1.0, 0.2.0, and 1.0.0 of a crate.
  2. Crate is deleted.
  3. New author wants to do some pre-release stuff, which would imply 0.3.0 with a full semver check, but would otherwise limit them to 2.0.0 with a basic min_version set up.

I'm open to feedback on whether that's enough of a problem in practice to be worth the extra complexity.

@Turbo87
Copy link
Member Author

Turbo87 commented Nov 18, 2024

large company uses a crate that literally nobody else uses, it gets deleted, someone is sloppy on the internal code review when pulling a new version in

if large company uses the crate then shouldn't we see this reflected in the download numbers?

answering my own question: if they use an internal crates.io proxy, we might not see the full download numbers. so yeah, it's a potential edge case. but how likely that is... 🤷‍♂️

3. New author wants to do some pre-release stuff, which would imply 0.3.0 with a full semver check

if they want to do pre-release stuff then 2.0.0-alpha.1 etc. would also be available (if we implement the semver check correctly...)

I'm open to feedback on whether that's enough of a problem in practice to be worth the extra complexity.

I'd invoke my regular "if in doubt, ask/check what the other registries are doing" 😅

@LawnGnome
Copy link
Contributor

if they want to do pre-release stuff then 2.0.0-alpha.1 etc. would also be available (if we implement the semver check correctly...)

I guess my concern is that the hypothetical new crate author might want to be able to do a 0.x version to clearly communicate that they're still developing their crate, but realistically, let's wait until that's a real concern and go from there. We can transition from "min_version only" to "allow all semver-incompatible versions" later based on feedback anyway, since the latter is a strict superset of the former.

I'll put a PR together for the min_version behaviour and we can discuss further in there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-backend ⚙️ C-enhancement ✨ Category: Adding new behavior or a change to the way an existing feature works
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants