-
Notifications
You must be signed in to change notification settings - Fork 1k
Block stdlib names #2409
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Block stdlib names #2409
Conversation
Prohibits new projects with names which collide on canonicalized name with standard library modules from being created. Uses stdlib_list from https://github.com/jackmaney for programatic access to a hopefully up to date list of standard library names.
There are several packages in the stdlib that have backports on PyPI, or were originally on PyPI. For example:
What does this do with them? |
@alex, no impact on existing projects, this just stops any new projects with such a name from being created without admin intervention. existing projects can upload new releases without any intervention. |
In the discussion in #2151 the requirement to have an escape hatch came up, for example for packages that backport stdlib modules. How would those be handled? |
my opinion is that any new backports should be explicitly named as such. in the rare case that a package name included in the standard library module requires registration with the precise name, an administrative tool in this project should allow that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This generally looks good to me, a few comments about some things that I think could make this better.
warehouse/forklift/legacy.py
Outdated
) | ||
) | ||
|
||
STDLIB_PROHIBITTED = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small nitpick, but this would probably be better as a set, it will save some duplication for like ConfigParser
vs configparser
and will also make the in
check later on a tiny bit faster.
warehouse/forklift/legacy.py
Outdated
@@ -46,6 +50,17 @@ | |||
|
|||
PATH_HASHER = "blake2_256" | |||
|
|||
ALL_STDLIB = set( | |||
chain.from_iterable( | |||
[stdlib_list.stdlib_list(version) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be slightly better as (...)
instead of [...]
to make this is a generator expression. That will prevent us from creating an intermediate list.
warehouse/forklift/legacy.py
Outdated
@@ -46,6 +50,17 @@ | |||
|
|||
PATH_HASHER = "blake2_256" | |||
|
|||
ALL_STDLIB = set( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason for having ALL_STDLIB
exist instead of just nesting this logic inside the comprehension for STDLIB_PROHIBITED
? We could ditch the "inner" set()
then and just have STDLIB_PROHIBITED
be a set comprehension.
warehouse/forklift/legacy.py
Outdated
STDLIB_PROHIBITTED): | ||
raise _exc_with_message( | ||
HTTPBadRequest, | ||
"The name {!r} is not allowed.".format(form.name.data), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have a distinct message here? Even if it's just including "because it is the name of a Python standard library module" in the existing message?
For future backporting-without-admin-approval, note that the backports namespace package already exists as a self-documenting target for that purpose: https://pypi.python.org/pypi/backports/ (While that backport namespace itself could also be targeted, it isn't vulnerable to the naive "pip install urllib" problem that afflicts stdlib module names, so it's inherently less interesting as an attack vector) |
We should probably add some common "typosquatting" names to the blocklist as well - urrlib, etc. |
Well urllib is part of the stdlib so it will get included automatically with this PR. There is another PR that let us add a custom blacklist in the DB that we've pre-filled with previous typosquatting packages that we've taken ownership of. |
Prohibits new projects with names which collide on canonicalized name with standard library modules from being created.
Uses stdlib_list from @jackmaney for programatic access to a hopefully up to date list of standard library names.
RE: #2151