-
Notifications
You must be signed in to change notification settings - Fork 1k
Geolocate user IP addresses when presenting them in UI #8158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@di I can help with this. A couple questions:
|
I'm surprised it's that large -- we might want to see if there are more lightweight options, or whether we can slim it down at all (IIUC it probably contains a lot of data we don't need). That said, we check in the development database which clocks in at more than 60MB, so this might be OK for a one-time thing.
We use a datastore to store PyPI's files, but it wouldn't really be appropriate to put this there. In the best possible case, this would be a package on PyPI we could just add as a dependency, but I couldn't find anything that included it's own database, just libraries that talked to external APIs.
I think the easiest thing to do would be to add it into the repo and pull it in from there. Given the size though, I'm a little hesitant to say that's the best option. |
I'm looking into lighter weight options, taking inspiration from other libraries. There are also some recent license changes to the Geolite2 DB that we'll need to review, but I'll first look at other options |
I looked at db-ip's City DB. It has a more permissive license, but it's even bigger - 85M. Both providers offer a CSV format but in both cases, CSV is bigger than the corresponding MMDB file. I don't know how we would slim down the MMDB file, and curating the CSV file seems like a lot of work, especially since they release regular updates and we may not want to be locked in to the version that we curate. So it sounds like we can check one of the MMDB files into the git repo, or make an external API call - what was the reason that you didn't like the API call? |
Potential added expense / external dependency, probably not worth it for this very small feature. Unless we could do this entirely on the frontend, in JS, for free... is that an option? Agreed we probably don't want to curate CSV files. |
Hmm. We could do it in JS if the user grants access to their location, but then we'd need to store that in a DB to look back at it for future logins. To get the location just from IP, I think we'd still need an API call from the JS code. There are also country DBs that are much smaller (the Geolite country DB is less than 4M). But I don't think it helps us much to display the country of the user? |
Ah, I meant call some API from JS, not correlate the user's location from their browser w/ their IP. Another consideration for not using an API is maintaining privacy, i.e. keeping all the IPs w/in Warehouse. I think just displaying the country is probably too vague to be useful. |
If you are talking about a REST API (and not a library), wouldn't that also route all the IPs to an external location? |
@di Is there some reason (perhaps legal) that we can't have the Geolite2 or db-ips actual databases in a Python Package that we make a dependency of warehouse, and not add them into this repository directly? If we can do that, I feel like we should since we could have it be updated at some appropriate cadence and, more importantly, avoid making the git repository for this project bigger. |
From #8158 (comment):
|
And yes, I'm assuming we wouldn't be allowed to redistribute it. |
Assuming that Warehouse can't store and redistribute the db, there is a public BigQuery table under |
https://db-ip.com/db/download/ip-to-country-lite is under https://creativecommons.org/licenses/by/4.0/, which does allow redistribution. That's not the case for Geolite's dB though -- they changed licensing last year for California Consumer Privacy Act (CCPA) compliance: https://blog.maxmind.com/2019/12/18/significant-changes-to-accessing-and-using-geolite2-databases/ |
Some ideas here on implementing this with more privacy-protecting features around IP addresses as well:
|
GeoIP and salting at edge are done in pypi/infra#123 |
Logging salted IPs are done in #13389 |
We now display GeoIP information if available: #13745 |
Ope, missed that this was a meta issue. |
Uh oh!
There was an error while loading. Please reload this page.
What's the problem this feature will solve?
Currently in the PyPI logged-in UI, we show the IP address that performed certain actions to the user:
I don't know my own IP offhand. Especially if there are multiple different IPs listed here, I would need to manually look up the approximate location where these came from to get an idea of whether they were actually me or not.
Describe the solution you'd like
It would be nice if PyPI also showed me an (approximate) location for any given IP address as well, so I could easily visually filter ones that seem incorrect, e.g.:
Additional context
This shouldn't require external API calls. Using something like https://pypi.org/project/geoip2/ with an embedded database like https://dev.maxmind.com/geoip/geoip2/geolite2/ would probably work.
Ideally this would be determined on the fly and not stored anywhere (e.g. along with the IP address), so if we someday replaced the mechanism with something more precise (or just updated the embedded DB) the updates would be immediately reflected.
Todo list
Replace IP addresses in journals with corresponding hashed IPThe text was updated successfully, but these errors were encountered: