-
Notifications
You must be signed in to change notification settings - Fork 1k
Databased Backed Blacklists #2396
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
<p> | ||
Blacklisting {{ blacklist.project }} will irreversibly delete | ||
the {{ existing.project.name }} project along with | ||
{{ existing.releases|length() }} releases and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
worth linking to the releases view?
{% if existing.project %} | ||
<p> | ||
Blacklisting {{ blacklist.project }} will irreversibly delete | ||
the {{ existing.project.name }} project along with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
worth linking to the project detail view?
below: | ||
|
||
<ul> | ||
{% for user in existing.users %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't think it'd be trivial but exposing the role_name here would be nice.
warehouse/admin/views/blacklist.py
Outdated
) | ||
return HTTPMovedPermanently(request.current_route_path()) | ||
elif canonicalize_name(confirm) != canonicalize_name(project_name): | ||
request.session.flash( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
request.db.add( | ||
JournalEntry( | ||
name=project.name, | ||
action="remove", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should the action here be blacklisted
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to leave this alone, because other things might be reading our journal entries and inferring things from them, and we want them to still go ahead and remove these projects if they are. It's not really super important to keep in the log that a project was blacklisted either, since it's a binary state and admins can see when it was and who did it (as well as any relevant comments).
warehouse/admin/views/blacklist.py
Outdated
queue="success", | ||
) | ||
|
||
return HTTPMovedPermanently(request.route_path("admin.blacklist.list")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe just a redirect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No blockers. I would like for us to work on a method of bulk blacklisting at some point.
In recent reports, there have been dozens of package names affected that would be somewhat cumbersome to blacklist one-by-one
Feedback handled. I'm a little nervous about the idea of a bulk blacklisting, since blacklisting means deleting packages and is generally irreversible, perhaps once we get some more experience with the single entrypoint we can explore that? |
As far as a bulk feature goes, that may be more appropriate to tackle as a "bulk quarantine" (i.e. shuffle all the affected artifacts off to a different S3 bucket, without deleting them outright) rather than as bulk automation of the existing removal behaviour (quarantine also has the benefit of preserving the artifacts for security analysis, similar to the way anti-virus quarantining works). |
We don't actually delete the files from S3 even with bulk delete. The cost of storing the files is minuscule it's actually serving the files that actually costs the bulk of our S3 "bill". This also makes it easier for us to deal with transactionally deleting these projects since it's all contained inside of the database. So deleting here is really just deleting the data from the database (although maybe at some point we'll have some process to garbage collect data from S3 that is no longer referenced in the DB). |
OK, so even today the database entries for "deleted" releases could potentially be backfilled from the artifacts retained in S3. I think that would already mitigate a lot of the risk associated with removal errors, even if the scripts for state reconstruction don't exist yet. |
func.normalize_pep426_name(form.name.data))).scalar(): | ||
raise _exc_with_message( | ||
HTTPBadRequest, | ||
"The name {!r} is not allowed.".format(form.name.data), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @dstufft - glad you started something!
Would suggest the message to be a little more clear?
For concerns of security (namesquatting and typosquatting), '{!r}' is not an allowed name for a project. Please use a different name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to get into specifics like that because there are a lot of reasons a name might be prohibited, not just for security purposes. If we want to provide details about why a name is blocked, we'll want to either expose the comment field or (more likely) provide categories to the blacklist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added that question to #2401 (comment) (as I think it's related to the general UX of how we communicate invalid name declarations to end users)
This removes the blacklist that was hardcoded, and instead replaces it with one that is stored in the database. It also provides an admin UI to view and add new blacklists.
This is a somewhat of a dangerous workflow, because blacklisting a name also involves removing all files, releases, etc that exist for that (essentially the same thing as hitting remove package on legacy PyPI). Due to the danger, adding a blacklist requires confirmation.
This still requires a few things:
Throwing this up here so people can take a look at it before I finish it.