Skip to content

Databased Backed Blacklists #2396

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Sep 17, 2017
Merged

Databased Backed Blacklists #2396

merged 16 commits into from
Sep 17, 2017

Conversation

dstufft
Copy link
Member

@dstufft dstufft commented Sep 16, 2017

This removes the blacklist that was hardcoded, and instead replaces it with one that is stored in the database. It also provides an admin UI to view and add new blacklists.

This is a somewhat of a dangerous workflow, because blacklisting a name also involves removing all files, releases, etc that exist for that (essentially the same thing as hitting remove package on legacy PyPI). Due to the danger, adding a blacklist requires confirmation.

This still requires a few things:

  • Add the ability to remove existing blacklists to allow users to claim the name again.
  • Make the confirmation page harder to blindly click though.
    • This might be requiring typing the name of the project that would be deleted into an input box?
    • This might require typing in a random string?
  • Write tests

Throwing this up here so people can take a look at it before I finish it.

@dstufft dstufft requested a review from ewdurbin September 16, 2017 06:40
@dstufft dstufft changed the title [WIP] Databased Backed Blacklists Databased Backed Blacklists Sep 16, 2017
<p>
Blacklisting {{ blacklist.project }} will irreversibly delete
the {{ existing.project.name }} project along with
{{ existing.releases|length() }} releases and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worth linking to the releases view?

{% if existing.project %}
<p>
Blacklisting {{ blacklist.project }} will irreversibly delete
the {{ existing.project.name }} project along with
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worth linking to the project detail view?

below:

<ul>
{% for user in existing.users %}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think it'd be trivial but exposing the role_name here would be nice.

)
return HTTPMovedPermanently(request.current_route_path())
elif canonicalize_name(confirm) != canonicalize_name(project_name):
request.session.flash(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the flash produced is pretty much identical in color to the header in admin, not a huge deal but threw me off when reviewing.
screen shot 2017-09-16 at 7 34 29 pm

request.db.add(
JournalEntry(
name=project.name,
action="remove",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the action here be blacklisted?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to leave this alone, because other things might be reading our journal entries and inferring things from them, and we want them to still go ahead and remove these projects if they are. It's not really super important to keep in the log that a project was blacklisted either, since it's a binary state and admins can see when it was and who did it (as well as any relevant comments).

queue="success",
)

return HTTPMovedPermanently(request.route_path("admin.blacklist.list"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just a redirect?

Copy link
Member

@ewdurbin ewdurbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No blockers. I would like for us to work on a method of bulk blacklisting at some point.

In recent reports, there have been dozens of package names affected that would be somewhat cumbersome to blacklist one-by-one

@dstufft
Copy link
Member Author

dstufft commented Sep 17, 2017

Feedback handled. I'm a little nervous about the idea of a bulk blacklisting, since blacklisting means deleting packages and is generally irreversible, perhaps once we get some more experience with the single entrypoint we can explore that?

@dstufft dstufft merged commit 52b2471 into pypi:master Sep 17, 2017
@dstufft dstufft deleted the db-backed-blacklist branch September 17, 2017 00:25
@ncoghlan
Copy link
Contributor

As far as a bulk feature goes, that may be more appropriate to tackle as a "bulk quarantine" (i.e. shuffle all the affected artifacts off to a different S3 bucket, without deleting them outright) rather than as bulk automation of the existing removal behaviour (quarantine also has the benefit of preserving the artifacts for security analysis, similar to the way anti-virus quarantining works).

@dstufft
Copy link
Member Author

dstufft commented Sep 17, 2017

We don't actually delete the files from S3 even with bulk delete. The cost of storing the files is minuscule it's actually serving the files that actually costs the bulk of our S3 "bill". This also makes it easier for us to deal with transactionally deleting these projects since it's all contained inside of the database.

So deleting here is really just deleting the data from the database (although maybe at some point we'll have some process to garbage collect data from S3 that is no longer referenced in the DB).

@ncoghlan
Copy link
Contributor

OK, so even today the database entries for "deleted" releases could potentially be backfilled from the artifacts retained in S3. I think that would already mitigate a lot of the risk associated with removal errors, even if the scripts for state reconstruction don't exist yet.

func.normalize_pep426_name(form.name.data))).scalar():
raise _exc_with_message(
HTTPBadRequest,
"The name {!r} is not allowed.".format(form.name.data),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @dstufft - glad you started something!

Would suggest the message to be a little more clear?

For concerns of security (namesquatting and typosquatting), '{!r}' is not an allowed name for a project. Please use a different name.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to get into specifics like that because there are a lot of reasons a name might be prohibited, not just for security purposes. If we want to provide details about why a name is blocked, we'll want to either expose the comment field or (more likely) provide categories to the blacklist.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added that question to #2401 (comment) (as I think it's related to the general UX of how we communicate invalid name declarations to end users)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants