-
Notifications
You must be signed in to change notification settings - Fork 1k
Ultranormalization encourages name squatting #11139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Here I use "name squatting" to indicate two slightly different attacks:
If there is a term to distinguish these two, it's not known to me. The difference, however, is quite small, and persuaded goals may mix. Both are somewhat of a concern for a package registry and both should be approached carefully without sacrificing one for another. |
Hey @orsinium, thanks for the issue. I think we're unlikely to reverse this policy: this may not be apparent to PyPI users but this has significantly cut down on the creation of malicious packages attempting to similar-squat legitimate project names. It's generally made PyPI safer to use but also means we (PyPI maintainers) can spend less time dealing with these types of packages. I think there's a few things we can do to make this policy easier to deal with, though:
|
I like the last 2 points. Even if the presence of the feature isn't something that can be discussed, there are still ways to improve it:
When I was working on dephell, I had an idea to warn users if they try to install a package that looks like a more popular project but mistyped (dephell/dephell#133). And to this day, I still think that allowing packages to have similar names but warning users about it could be a good idea. At least because it allows for an even more aggressive similarity search than the currently implemented ultranormalization.. |
This comment was marked as off-topic.
This comment was marked as off-topic.
Having hit the error message myself I am at a loss as to which name my chosen name is too similar to, despite searching the list of all project names. I could spend all day playing "guess a valid name", but I'd rather not. |
What's about to use the Levenshtein distance ? EDIT: Oh there a few issues about "Levenshtein distance": https://github.com/pypi/warehouse/issues?q=Levenshtein+distance ;) |
Yes, we tried that in #5001, unfortunately it was far too noisy to be actually useful. |
I agree. I tried to register "checkreqs" or so, and it was considered too similar to an unknown existing project. I don't see anyone against giving the squatted project name, either here or in #11872, so I should probably just send a PR? |
My opinion carries no organizational weight, but I think it would be a nice improvement if PyPI could be issue a more specific error message than the current one, and a PR would represent a very actionable decision for the maintainers, +1 from me. This may be easier to track if the other issue is re-opened or if a new issue with a suitably narrow scope is opened, since this issue has other things going on. (For the sake of context: I ended up on this issue after helping a user in #python on Libera.chat navigate the existing error message, which left them perplexed about what they collided with and what to do about it) |
Uh oh!
There was an error while loading. Please reload this page.
Describe the bug
#10498 introduced "ultranormalization" to prevent name squatting of package names similar to ones already registered:
While the initiative, in general, is something of major concern for PyPI (and any other big package registry), the implementation has a few painful drawbacks:
lili
(a French name) allows to additionally squat many other similar names, such as1111
,i111
,i11l
(could be a good name for internationalization package),i-11-l
,iiii
(4 in Roman numerals), and so on. In total, it's a huge amount of combinations, the exact number depends on the package max size and if you count names such asl-------ll--------l
.l10n
is rejected by PyPI because there is a packagei10n
, claiming thei10n
name would reveal that there is a packagelion
which the user would need to claim again. How many times can one claim names to register a single package? And if PyPI would show all similar names, would it be reasonable to allow mass name claiming? Then again, it's not much different from mass squatting. And if I could claim any name itself without claiming all collisions, wouldn't it defeat the point of the change altogether?Expected behavior
"What I see is what I get". If there is a package with this name, the name is already taken. You might claim it as per PEP 541 or pick another one. If there is no package with such name (and it's not in the stdlib), you can use it.
To Reproduce
Try to register
l10n
package. Or runtest_fails_with_ultranormalized_names
from the PyPI test suite.My Platform
Irrelevant.
Additional context
Irrelevant.
Possible solutions
I understand the motivation behind the change but find it bringing more harm than good. To not be that person who only complaints about things, there are some solutions for the problem I see:
djang0
but at the same time there is no harm in having some not very popular or nearly abandoned packages collide. However, PyPI doesn't have a reliable metric of package popularity just yet. The downloads count is stored separately in BigQuery (and querying it for each name registration could be costly) and even then, the metric is pretty unreliable. GitHub stars count is an even worse indication of popularity and is available not for all packages.Sorry for a lot of text. I don't want to fight against your vision of how the project should look like but I find this particular change harmful for both security and user experience.
The text was updated successfully, but these errors were encountered: