-
Notifications
You must be signed in to change notification settings - Fork 18k
net/http: context cancellation race condition between http.Server, httputil.ReverseProxy and http.Transport #20617
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
To pinpoint more precisely the origin of the bug and confirm the theory a second time, I created instead a clone library of httputil.ReverseProxy, commenting out line 146 of reverseproxy.go:
The race condition disappeared, the site works normally with torBrowser. So the race condition is indeed in the closing order between on one side the completion of the external request (from the external client) and on the other side the completion of the internal request (from the reverse proxy + transport): That race condition is invisible for most browsers but becomes visible with torBrowser, maybe because it has very unusual delay patterns (roundtrips can take up to several seconds) in the overall request/response cycles. |
Well the documentation for CloseNotify says:
And from googling it looks like Tor Browser enables pipelining by default, and ReverseProxy doesn't check the method when using CloseNotify. So I think ReverseProxy is incompatible with HTTP/1.1 pipelining and therefore with Tor Browser (unless you disable pipelining in Tor Browser or make it use HTTP/2). But you could try to work around this in the Director function like you have. |
Tor Browser enables pipelining by default? Ouch. Everybody else who's tried that has ended up disabling it. I thought we'd all concluded it was a failed experiment. Go does support HTTP/1.1 pipelining in general, with the exception of the context cancellation caveat noted above. We considered that acceptable at the time because ~nobody uses pipelining in the wild. I'm super curious to read the thread where Tor decided to that, if anybody has a link. |
I'll keep this bug open, but I can't think of a good fix. Ideas welcome. |
Maybe a starting point:
https://blog.torproject.org/blog/experimental-defense-website-traffic-fingerprinting
Envoyé de mon iPhone
… Le 11 juin 2017 à 04:02, Brad Fitzpatrick ***@***.***> a écrit :
I'll keep this bug open, but I can't think of a good fix. Ideas welcome.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
What are the negative side effects of resolving the CloseNotify caveat?
Raphael
… Le 11 juin 2017 à 04:02, Brad Fitzpatrick ***@***.***> a écrit :
I'll keep this bug open, but I can't think of a good fix. Ideas welcome.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub <#20617 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AC4KehTfBcAP3J3KMwdKc64T5WF3sGN9ks5sC0rPgaJpZM4N0kji>.
|
The drug hasn't even been invented yet, so it's a bit early to be talking about negative side effects. If you have a proposed fix, I'm all ears. |
Oh, are you referring to your httputil.ReverseProxy change? That's the wrong place to make the change. That only affects the httputil package, but not anything else that uses |
Oh sorry I didn't mean that, of course the httputil change was only a hack to confirm my pipelining issue theory. I was thinking more about adding in CloseNotifier (or in a separate HalfCloseNotifier interface) this method: An alternative is to address specifically only the response side with a ResponseCloseNotifier: But that looks as complex and maybe less powerful for later. Or maybe there are other ways to provide this distinction between request streams and response streams. Then httputil.ReverseProxy, and maybe other http API clients, could use HalfCloseNotify or one of its alternatives together with the request idempotency to properly distinguish between request vs response closings. But I am still new to that pipelining issue and more importantly to the exact set of API contracts satisfied today by the http lib so I may have missed something. |
My answer above was poor as I didn't check where exactly is the CloseNotifier interface implemented. In such case my first alternative (adding a HalfCloseNotify method) looks very abstract : it crosses the conceptual boundary of what the response writer should know and tell about by introducing a concept of "half-close" which wasnt triggered by the response stream itself but by specific request stream events. CloseNotifier already crossed the boundary (by including requests closings in its notification scope) but this would only reinforce it and make it more explicit. The second alternative has a very confusing name: ResponseCloseNotifier for a ResponseWriter method. But we could call it FullCloseNotifier maybe A third alternative is to extend the spec of the behavior of the CloseNotify() channel in the following backward-compatible way: As the doc says nothing about the closure of the channel, only about at most 1 value received, I guess this alternative satisfies the Go1 compatibility constraint. That way reverseproxy can distinguish between request and response stream closing. |
If this is fixed, the solution should not involve new API. It'd be nice to just only close the Request.Context at the right time. It would need to distinguish the background-reader byte read to see whether it's a real byte (e.g. |
Ah I didn't see the "may" in the CloseNotifier doc about the pipeline special case. I thought it was a "does" and thus that we were tied by Go1 backward compatibility contract. |
Ah I just saw the connReader.closeNotifyFromPipelinedRequest method. But couldn't we have that method, instead of calling connReader.closeNotify, wait on some kind of signal that the current request have been finished processing and just return? I dont know the inner workings of your signaling infrastructure there, do you have something, e.g. a sync.Cond under mu mutex supervision that you could wait on? A bit like Close vs Shutdown, where one is waiting for others to complete while the other is aggressively shutting things down. |
Sorry, I'm working on getting the Go 1.9beta1 release out. I won't have time to work on this bug for a number of months. (This definitely won't be fixed in Go 1.9) |
For my use case is ok, good release ! |
Probably worth noting that Firefox (which Tor Browser is based on) has removed support for HTTP/1.1 pipelining in Firefox 54 (released today). |
Can't reproduce the bug with latest version of torBrowser (7.0.6). Might be due to random network fluctuations, or to them writing off pipelining. But couldn't find any announcement on that. Closing the ticket as far as what I could reproduce. |
Do you think it's possible to reopen this issue ? I see the same problem while using
@bradfitz I would be really happy to help on this, but I'm not sure to understand what you mean here. |
@juliens please open a new issue and include all the information requested by the issue template. A runnable code sample to demonstrate the issue would be extremely beneficial. |
Fixed by 00ebfcd |
Please answer these questions before submitting your issue. Thanks!
What version of Go are you using (
go version
)?go version go1.8.3 linux/amd64
What operating system and processor architecture are you using (
go env
)?GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/rgeronimi/sabrina"
GORACE=""
GOROOT="/usr/local/go"
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build401940531=/tmp/go-build -gno-record-gcc-switches"
CXX="g++"
CGO_ENABLED="1"
PKG_CONFIG="pkg-config"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
What did you do?
I deployed on port 80 a standard http.Server S, plugged into a standard httputil.ReverseProxy P, reverse proxying traffic through a standard http.Transport T to another standard golang web server S2 in a separate unix process. P is configured to use keep-alive.
What did you expect to see?
When used with any browser, S should behave normally, displaying the full website of S2 correctly.
What did you see instead?
It worked with Safari, Chrome and Firefox. But with torBrowser (vanilla MacOS version 7.0, nothing customized), some page resources where randomly broken with a "502 Bad Gateway" message and an empty content (and the content-length header says 0). Headers and torBrowser logs show nothing special beyond the 502 error.
I also tried with all other browsers (outside of torBrowser) but over the Tor network, and everything works. So the issue is triggered by the torBrowser itself and not by the Tor network.
Only the first (and sometimes second) resource is correctly downloaded, and then all the following ones fail with a 502. This is inconsistent, suggesting a race condition.
Suspecting a keep-alive connection management issue somewhere in the chain, I checked the chronology and it matches perfectly a scenario where a connection C is always deemed "broken" by http.Transport after the first resource request Req1 is answered, compromising all the following resources requests Req2, Req3, ... that were pipelined on that same connection C.
The rare cases where not only 1 resource but 2 resources were correctly received by torBrowser is because there were 2 parallel connections to pipeline the requests on.
I checked the httputil.ReverseProxy source code and saw that it sends back 502 errors when the request context is canceled, which matches the torBrowser status code received.
So I made the hypothesis that, for some reasons:
(1) torBrowser systematically has a specific behavior that pushes the http.Server S to declare the request Req1 "canceled" slightly before T considers its own clone TReq1 of Req1 completed
(2) Consequently T propagates that cancellation to all the following requests TReq2, TReq3, ... that it had pipelined on the same connection C (established between T and S2)
(4) Which forces P to generate the 502 errors on each of their original requests Req2, Req3, ...
To test this hypothesis, I implemented this workaround in the httputil.ReverseProxy.Director to remove the influence of contexts on the reverse proxy P and its transport T:
This workaround worked perfectly : the bug symptoms disappeared, torBrowser shows the website perfectly.
The part missing is the exact explanation for (1). Tests show that the breakages are inconsistent : sometimes all requests go through correctly. So it looks like a subtle race condition between http.Server, httputil.ReverseProxy and http.Roundtrip
The text was updated successfully, but these errors were encountered: