Skip to content

proxy.golang.org: unexpected go module pointing at non-go git repository #51284

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
antarus12345 opened this issue Feb 20, 2022 · 21 comments
Closed
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. proxy.golang.org
Milestone

Comments

@antarus12345
Copy link

{ "Path": "gitweb.gentoo.org/repo/gentoo.git", "Version": "v0.0.0-20220214235306-7a973fdc5ef1", "Timestamp": "2022-02-15T00:05:10.927349Z"

gitweb.gentoo.org/repo/gentoo.git is not a gomodule. Its the entire source repo for Gentoo Linux; and it's 900MB in size (with full history.)

Recently the Gentoo Infrastructure team received traffic alerts on our origin servers because the go-proxy system was downloading this repository (read: 800 times per 24h period.) The origin repo receives a commit about every 10 minutes, so it changes often.

Some questions then from our side:

  • Can we learn who published this module? We would like to understand why this repo is in the origin. Our repo is not a go module.
  • Was there a behavior change recently on the go-proxy side? Our logs from before Feb 18 2022 indicate either a shallow clone (and most of our repo is history, fetching shallow should be quicker / smaller) or previously the behavior was to do 2 fetches (perhaps 1 fetch to find the most recent git sha, and then a second fetch to fetch the diff between what gomoduleproxy had for our repo, and the most recent sha.)
  • Is there some reason why goproxy is not caching the returned content in a smarter way? My thought here was that some go-module references some origin repo, and the origin reference has some git SHA checksum; so ideally even if you fetch'd the full history, you could it once, and then cache it. Perhaps the source module is being bumped often? Either way 800 fetches a day seems extreme unless its being bumped very often.

latest:
74.125.191.67 - - [20/Feb/2022:04:30:43 +0000] "GET /git/repo/gentoo/info/refs?service=git-upload-pack HTTP/1.1" 200 890 "-" "GoModuleMirror/1.0 (+https://proxy.golang.org)" "-" 0.003
74.125.191.67 - - [20/Feb/2022:05:01:02 +0000] "GET /git/repo/gentoo/info/refs?service=git-upload-pack HTTP/1.1" 200 890 "-" "GoModuleMirror/1.0 (+https://proxy.golang.org)" "-" 0.003
74.125.191.67 - - [20/Feb/2022:05:01:02 +0000] "GET /git/repo/gentoo/info/refs?service=git-upload-pack HTTP/1.1" 200 890 "-" "GoModuleMirror/1.0 (+https://proxy.golang.org)" "-" 0.002
74.125.191.67 - - [20/Feb/2022:05:02:44 +0000] "POST /git/repo/gentoo/git-upload-pack HTTP/1.1" 200 925833752 "-" "GoModuleMirror/1.0 (+https://proxy.golang.org)" "-" 101.409
74.125.191.67 - - [20/Feb/2022:17:39:15 +0000] "GET /git/repo/gentoo/info/refs?service=git-upload-pack HTTP/1.1" 200 890 "-" "GoModuleMirror/1.0 (+https://proxy.golang.org)" "-" 0.003

older:
74.125.191.67 - - [01/Feb/2022:00:58:40 +0000] "GET /git/repo/gentoo/info/refs?service=git-upload-pack HTTP/1.1" 200 890 "-" "GoModuleMirror/1.0 (+https://proxy.golang.org)" "-" 0.003
74.125.191.67 - - [01/Feb/2022:00:58:41 +0000] "GET /git/repo/gentoo/info/refs?service=git-upload-pack HTTP/1.1" 200 890 "-" "GoModuleMirror/1.0 (+https://proxy.golang.org)" "-" 0.003
74.125.191.67 - - [01/Feb/2022:00:58:41 +0000] "POST /git/repo/gentoo/git-upload-pack HTTP/1.1" 200 507 "-" "GoModuleMirror/1.0 (+https://proxy.golang.org)" "-" 0.003
74.125.191.67 - - [01/Feb/2022:00:58:49 +0000] "POST /git/repo/gentoo/git-upload-pack HTTP/1.1" 200 54601442 "-" "GoModuleMirror/1.0 (+https://proxy.golang.org)" "-" 8.134
74.125.191.67 - - [01/Feb/2022:03:02:19 +0000] "GET /git/repo/gentoo/info/refs?service=git-upload-pack HTTP/1.1" 200 890 "-" "GoModuleMirror/1.0 (+https://proxy.golang.org)" "-" 0.003
74.125.191.67 - - [01/Feb/2022:03:02:19 +0000] "GET /git/repo/gentoo/info/refs?service=git-upload-pack HTTP/1.1" 200 890 "-" "GoModuleMirror/1.0 (+https://proxy.golang.org)" "-" 0.002
74.125.191.67 - - [01/Feb/2022:03:02:20 +0000] "GET /git/repo/gentoo/info/refs?service=git-upload-pack HTTP/1.1" 200 890 "-" "GoModuleMirror/1.0 (+https://proxy.golang.org)" "-" 0.002
74.125.191.67 - - [01/Feb/2022:03:02:20 +0000] "POST /git/repo/gentoo/git-upload-pack HTTP/1.1" 200 507 "-" "GoModuleMirror/1.0 (+https://proxy.golang.org)" "-" 0.003
74.125.191.67 - - [01/Feb/2022:03:02:29 +0000] "POST /git/repo/gentoo/git-upload-pack HTTP/1.1" 200 54605086 "-" "GoModuleMirror/1.0 (+https://proxy.golang.org)" "-" 8.238
74.125.191.67 - - [01/Feb/2022:03:51:39 +0000] "GET /git/repo/gentoo/info/refs?service=git-upload-pack HTTP/1.1" 200 890 "-" "GoModuleMirror/1.0 (+https://proxy.golang.org)" "-" 0.002
74.125.191.67 - - [01/Feb/2022:03:51:41 +0000] "GET /git/repo/gentoo/info/refs?service=git-upload-pack HTTP/1.1" 200 890 "-" "GoModuleMirror/1.0 (+https://proxy.golang.org)" "-" 0.002
74.125.191.67 - - [01/Feb/2022:03:51:41 +0000] "POST /git/repo/gentoo/git-upload-pack HTTP/1.1" 200 507 "-" "GoModuleMirror/1.0 (+https://proxy.golang.org)" "-" 0.003
74.125.191.67 - - [01/Feb/2022:03:51:49 +0000] "POST /git/repo/gentoo/git-upload-pack HTTP/1.1" 200 54611015 "-" "GoModuleMirror/1.0 (+https://proxy.golang.org)" "-" 8.124
74.125.191.67 - - [01/Feb/2022:10:27:50 +0000] "GET /git/repo/gentoo/info/refs?service=git-upload-pack HTTP/1.1" 200 890 "-" "GoModuleMirror/1.0 (+https://proxy.golang.org)" "-" 0.003
74.125.191.67 - - [01/Feb/2022:10:27:50 +0000] "GET /git/repo/gentoo/info/refs?service=git-upload-pack HTTP/1.1" 200 890 "-" "GoModuleMirror/1.0 (+https://proxy.golang.org)" "-" 0.002
74.125.191.67 - - [01/Feb/2022:10:27:51 +0000] "GET /git/repo/gentoo/info/refs?service=git-upload-pack HTTP/1.1" 200 890 "-" "GoModuleMirror/1.0 (+https://proxy.golang.org)" "-" 0.002
74.125.191.67 - - [01/Feb/2022:10:27:51 +0000] "POST /git/repo/gentoo/git-upload-pack HTTP/1.1" 200 507 "-" "GoModuleMirror/1.0 (+https://proxy.golang.org)" "-" 0.003
74.125.191.67 - - [01/Feb/2022:10:28:00 +0000] "POST /git/repo/gentoo/git-upload-pack HTTP/1.1" 200 54676069 "-" "GoModuleMirror/1.0 (+https://proxy.golang.org)" "-" 8.143
74.125.191.67 - - [01/Feb/2022:11:17:19 +0000] "GET /git/repo/gentoo/info/refs?service=git-upload-pack HTTP/1.1" 200 890 "-" "GoModuleMirror/1.0 (+https://proxy.golang.org)" "-" 0.003
74.125.191.67 - - [01/Feb/2022:11:17:20 +0000] "GET /git/repo/gentoo/info/refs?service=git-upload-pack HTTP/1.1" 200 890 "-" "GoModuleMirror/1.0 (+https://proxy.golang.org)" "-" 0.003

Thanks,

-Alec Warner
[email protected]

@seankhliao seankhliao added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Feb 20, 2022
@seankhliao
Copy link
Member

cc @katiehockman @heschi @hyangah

@antichris
Copy link

antichris commented Feb 20, 2022

Related to #44577 (or even a duplicate?)

@heschi
Copy link
Contributor

heschi commented Feb 20, 2022

Sorry for the trouble. I've been doing some regression testing preparatory to Go 1.18, and that involves downloading more things more frequently than would be done in normal operation. I don't expect to do the testing again for this release cycle, but if you don't want this to happen again in 6 months I can exclude gentoo.org from the regression test set.

#44577 is about behavior during normal operation. We can also add gentoo.org to the list mentioned there if you'd rather get less traffic across the board.

@thesamesam
Copy link

thesamesam commented Feb 20, 2022

At least that address, but actually gentoo.org as a whole, doesn't host a Go module anyway. We didn't publish any Go module referencing gentoo.org but nobody else should have either.

It's not so much about less traffic (which is important to us, and we should address it, but it's not the main issue), but we don't get why there was any traffic at all. We're not expecting anything from goproxy as we're not hosting anything it should fetch.

@seankhliao
Copy link
Member

Note the repo does contain a .go file, so it is recognized as a valid module.

@thesamesam
Copy link

thesamesam commented Feb 20, 2022

Note the repo does contain a .go file, so it is recognized as a valid module.

There's one file, not in the repository root (so I'm not sure how it gets to the point where it discovers it), but right:

$ find . | grep -i "\.go"
./sys-auth/docker_auth/files/version.go

But not a go.mod or go.sum. We can see about making the ebuild download this file instead of hosting it in the repository but we didn't ask for this traffic (in any volume) and we're not publishing a module either.

Sometimes we package software which requires an auxillary file (our files/ directory per-package). From our perspective, that's not asking for goproxy traffic, nor is it intended to work standalone, or anything remotely like that. It's just a file which the ebuild will use in some way.

I don't think having a single .go file several directories deep should be sufficient. So, even if it is recognised as a valid Go module, we don't think it should be.

We don't understand who or why someone has published a module referencing our repository, but it's definitely not intended to ever be used as a Go module, nor is it a valid one as far as we're concerned. It's not a repository of Go code or anything.


There's a few concerns here, so iterating on Alec's original questions (but the "what event" question has been addressed):

  1. gentoo.git is not a valid Go module (or shouldn't be from our perspective) and we don't want this traffic at all given that. How can we opt out or refine the goproxy logic?
  2. Somebody published(?) a Go module referencing our repository, which is how we ended up in this situation. We don't see why anybody would do this in good faith given what the repository is. Can we prevent that? Can we remove the module doing this?
  3. Can goproxy better cache this, supposing we were a legitimate Go module?

@heschi
Copy link
Contributor

heschi commented Feb 22, 2022

For better or worse, a repository needs neither a go.mod nor a .go file to be a valid Go module. So moving files around won't change anything. That's a decision made long ago in the go command and it's not something we can change just in proxy.golang.org. I don't know offhand if people depend on that or not; it may be possible to make a change in Go 1.19 but that would be a separate discussion.

Nobody publishes to proxy.golang.org; it is better understood as a caching proxy than a publishing site like NPM or PyPI. I can't say whether the user that's triggering downloads of your repository is malicious, confused, or somehow legitimate, but they're requesting it and proxy.golang.org is serving it to them.

Again, I'm happy to help reduce load on your servers if you want, either the recent spike due to regression testing or blocking out your domain entirely. (However, that will probably result in the user in question downloading the module directly from you.) But we're not in a position to make major design changes to the service right now, so that's about the best I can offer.

@antarus12345
Copy link
Author

For better or worse, a repository needs neither a go.mod nor a .go file to be a valid Go module. So moving files around won't change anything. That's a decision made long ago in the go command and it's not something we can change just in proxy.golang.org. I don't know offhand if people depend on that or not; it may be possible to make a change in Go 1.19 but that would be a separate discussion.

This is (as i mention below) more of an abuse concern for you than any specific concern of mine; so I defer.

Nobody publishes to proxy.golang.org; it is better understood as a caching proxy than a publishing site like NPM or PyPI. I can't say whether the user that's triggering downloads of your repository is malicious, confused, or somehow legitimate, but they're requesting it and proxy.golang.org is serving it to them.

My team is a bit upset about someone requesting a go module that references our repo, and while it is disconcerting, we cannot really control who links to us, so I think that battle is mostly futile (e.g. I agree with you.) Its an abuse problem for you (e.g. people forcing you to cache illegal content) but its not my problem necessarily.

Again, I'm happy to help reduce load on your servers if you want, either the recent spike due to regression testing or blocking out your domain entirely. (However, that will probably result in the user in question downloading the module directly from you.) But we're not in a position to make major design changes to the service right now, so that's about the best I can offer.

Great, so a couple of specific questions then.

(1) You said you did some work for regression testing golang-1.18. Is our repo in your test set?
How was the test set constructed (e.g. how did our repo end up in your test set?)
Our repos are:
https://anongit.gentoo.org/git/repo/gentoo.git
https://anongit.gentoo.org/git/repo/gentoo/historical.git

You might also see them at 'gitweb.gentoo.org' names which canonicalize to anongit.gentoo.org.

(2) I'm still not grasping why the gomodule proxy failed to cache our content. You said you ran regression testing (which necessitated more downloads) but I'm not quite understanding how / why this occurred? Was it DIRECT traffic, or did you flush the proxy cache? What prevents other users from this sort activity; and here I don't necessarily mean fetching our code often (which anyone can do, its public) but here I mean making the gomodule proxy do the fetches.

(3) Our repo is 900MB (with history) and previous to Feb 18 2022, fetches from gomodule proxy routinely fetched 50MB or so (which seems like a shallow fetch.) After feb 18, the number of fetches ramped up, and they were no longer shallow. After feb 20 (even before reporting this issue) the traffic seems to have gone away. Similar to my question in (2), how do users control the behavior of the proxy here? Was there a proxy rollout during that time?

In general I don't care if you mirror our content (its GPL-2, mirror away), my goal is to understand how the gomodule proxy interacts with our origin so we don't overspend or overuse our computing resources. Appreciate any engagement on that topic ;)

-A

@heschi
Copy link
Contributor

heschi commented Feb 22, 2022

(1) The test set is a random subset of the things the proxy has been asked for in the past.
(2) It didn't, or at least I haven't seen evidence of it yet -- we need to compare the new version of Go (1.18) to the version currently in prod (1.17), which involves re-downloading things using both versions. In general it is possible for users to funnel an arbitrary amount of traffic through the proxy, but we try to reduce it as much as we can, and we do have our own rate-limiting and abuse prevention in place.
(3) Those days are the days I was running the regression tests. It's possible that some of that traffic required deeper fetches than usual, but I just verified that 1.17 and 1.18 both do shallow fetches in the common case, so I don't think you need to be concerned about the new version behaving worse once it's rolled out. It will probably recur with the next regression test run unless we prevent it.

In general we hope that the proxy is a net win for origins, since we can serve the same module many times over for one upstream request. For less popular modules, which I imagine Gentoo's are, it's unfortunately possible it's a loss.

@antarus12345
Copy link
Author

(1) The test set is a random subset of the things the proxy has been asked for in the past. (2) It didn't, or at least I haven't seen evidence of it yet -- we need to compare the new version of Go (1.18) to the version currently in prod (1.17), which involves re-downloading things using both versions. In general it is possible for users to funnel an arbitrary amount of traffic through the proxy, but we try to reduce it as much as we can, and we do have our own rate-limiting and abuse prevention in place. (3) Those days are the days I was running the regression tests. It's possible that some of that traffic required deeper fetches than usual, but I just verified that 1.17 and 1.18 both do shallow fetches in the common case, so I don't think you need to be concerned about the new version behaving worse once it's rolled out. It will probably recur with the next regression test run unless we prevent it.

In general we hope that the proxy is a net win for origins, since we can serve the same module many times over for one upstream request. For less popular modules, which I imagine Gentoo's are, it's unfortunately possible it's a loss.

Thanks, can I submit a PR somewhere to remove our repos URIs from your test set or will this issue suffice?

If we have to block the traffic (lets assume some nebulous future state where it comes back) is there some recommended thing we should do on our end in terms of status codes that the proxy will be happiest with?

-A

@heschi
Copy link
Contributor

heschi commented Feb 23, 2022

I've made the change that will remove your repositories from the test set, and also will reduce background traffic. You should see a reduction in traffic in the next day or two. I'll close this issue now but feel free to comment if something goes wrong.

If you want to block traffic you can do so any way that will break Git, it doesn't matter to us.

@antarus12345
Copy link
Author

antarus12345 commented Dec 1, 2022

Hello. This has begun again:

20221130.log:172.217.36.247 - - [30/Nov/2022:22:33:57 +0000] "POST /git/repo/gentoo/git-upload-pack HTTP/1.1" 200 1055734149 "-" "GoModuleMirror/1.0 (+https:/
/proxy.golang.org)" "-" 54.133

Per your comment from last year we have blocked the gomodule proxy as we don't think it has a legitimate reason to be fetching our origin (we host no go repos.)

-A

@heschi
Copy link
Contributor

heschi commented Dec 1, 2022

I suspect this was due to a burst of direct user requests for the module -- automatic traffic should still be disabled. But blocking it is fine.

@robbat2
Copy link

robbat2 commented Dec 1, 2022

@heschi our repo is NOT a go module. Can you please explain how users can still request it?

I'm wondering if there a potential here to use GoModuleMirror to DoS arbitrary git HTTP services? Prior to blocking you, the service was generating >400GB/hour of traffic for this repo, which is small for Google, but could lead to a big bill for smaller organizations. (e.g. it's $36/hour at EC2 egress rates).

Edit And the single .go file mentioned previously has been removed for many months

@heschi
Copy link
Contributor

heschi commented Dec 1, 2022

I addressed the first question above.

I don't work on proxy.golang.org much any more, so I will defer to @golang/tools-team for the rest of the discussion.

@thepudds
Copy link
Contributor

thepudds commented Jan 9, 2023

For better or worse, a repository needs neither a go.mod nor a .go file to be a valid Go module. So moving files around won't change anything.

Hi @robbat2, FWIW, a related proposal is currently marked "likely accept" and in the "final comment" period:

#31866 cmd/go: do not download “modules” that contain no go.mod or *.go

Also, it looks like this issue you opened here was closed as "completed" as of February 2022, with you asking a follow-up question last month. If you are still interested in this issue, I would recommend that you re-open this issue (or file a new one if that's better for some reason).

I think the core Go team can miss comments on closed issues given the sheer volume of comments overall on the various Go github repos.

(Finally, I'm basically a random gopher from the broader community, so don't trust what I say too much ;-)

@robbat2
Copy link

robbat2 commented Jan 10, 2023

@thepudds it's not showing the permission to re-open this issue.

@thepudds
Copy link
Contributor

Hi @robbat2, sorry, I missed that it was @antarus12345 who opened this. They should have permission to re-open, but given you expressed interest in it being re-open, I will do so. (I'm a community gardener).

@hyangah
Copy link
Contributor

hyangah commented Jan 11, 2023

Hi @robbat2 @antarus12345
Sorry for the trouble. We are currently working on adding gitweb.gentoo.org/repo/gentoo.git to our exclusion list.
We will close this issue once the deployment is complete.

@rsc
Copy link
Contributor

rsc commented Jan 11, 2023

@robbat2

our repo is NOT a go module. Can you please explain how users can still request it?

To expand on @heschi's response above, if you have a Go distribution and users for whatever reason run

go get gitweb.gentoo.org/repo/gentoo.git/dir

then the go command will interpret that as an import path requesting the Go package in dir of that repository. Of course there is no Go package there, but to find that out, the go get command still does a git clone.

If users have GOPROXY=direct set, then you'd be seeing clones from user systems. Because the default is to use proxy.golang.org, you are seeing the clones from proxy.golang.org.

Clearly someone is running a command like the above, because that's the only way the proxy ever learns about any git repo. (Users being confused happens.) What we did back in Feb 2022 was remove these paths from being refreshed preemptively to prepare for future fetches. But the proxy would still connect if a direct request came in. Making the proxy reject those requests is what we're going to do next. Note that when the proxy starts rejecting those requests, the go command is going to fall back to trying a direct connection instead, but at least then you'll have better attribution of where it is coming from, and maybe users will get tired of waiting and interrupt the download.

@seankhliao seankhliao added this to the Unreleased milestone Jan 20, 2023
@hyangah
Copy link
Contributor

hyangah commented Jan 23, 2023

@robbat2 @antarus12345 Thanks for being patient.
The change to reject requests with domain names anongit.gentoo.org and gitweb.gentoo.org was deployed last week.
If you still observe traffic from us, or there are extra domain names we should consider to filter out, please let us know.

@hyangah hyangah closed this as completed Jan 23, 2023
@golang golang locked and limited conversation to collaborators Jan 23, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. proxy.golang.org
Projects
None yet
Development

No branches or pull requests

10 participants