net/http: context cancellation race condition between http.Server, httputil.ReverseProxy and http.Transport #20617

rgeronimi · 2017-06-08T20:18:32Z

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (`go version`)?

go version go1.8.3 linux/amd64

What operating system and processor architecture are you using (`go env`)?

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/rgeronimi/sabrina"
GORACE=""
GOROOT="/usr/local/go"
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build401940531=/tmp/go-build -gno-record-gcc-switches"
CXX="g++"
CGO_ENABLED="1"
PKG_CONFIG="pkg-config"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"

What did you do?

I deployed on port 80 a standard http.Server S, plugged into a standard httputil.ReverseProxy P, reverse proxying traffic through a standard http.Transport T to another standard golang web server S2 in a separate unix process. P is configured to use keep-alive.

What did you expect to see?

When used with any browser, S should behave normally, displaying the full website of S2 correctly.

What did you see instead?

It worked with Safari, Chrome and Firefox. But with torBrowser (vanilla MacOS version 7.0, nothing customized), some page resources where randomly broken with a "502 Bad Gateway" message and an empty content (and the content-length header says 0). Headers and torBrowser logs show nothing special beyond the 502 error.

I also tried with all other browsers (outside of torBrowser) but over the Tor network, and everything works. So the issue is triggered by the torBrowser itself and not by the Tor network.

Only the first (and sometimes second) resource is correctly downloaded, and then all the following ones fail with a 502. This is inconsistent, suggesting a race condition.

Suspecting a keep-alive connection management issue somewhere in the chain, I checked the chronology and it matches perfectly a scenario where a connection C is always deemed "broken" by http.Transport after the first resource request Req1 is answered, compromising all the following resources requests Req2, Req3, ... that were pipelined on that same connection C.
The rare cases where not only 1 resource but 2 resources were correctly received by torBrowser is because there were 2 parallel connections to pipeline the requests on.

I checked the httputil.ReverseProxy source code and saw that it sends back 502 errors when the request context is canceled, which matches the torBrowser status code received.

So I made the hypothesis that, for some reasons:
(1) torBrowser systematically has a specific behavior that pushes the http.Server S to declare the request Req1 "canceled" slightly before T considers its own clone TReq1 of Req1 completed
(2) Consequently T propagates that cancellation to all the following requests TReq2, TReq3, ... that it had pipelined on the same connection C (established between T and S2)
(4) Which forces P to generate the 502 errors on each of their original requests Req2, Req3, ...

To test this hypothesis, I implemented this workaround in the httputil.ReverseProxy.Director to remove the influence of contexts on the reverse proxy P and its transport T:

func director(req *http.Request) {
	quietReq := req.WithContext(context.Background())
	*req = *quietReq
	...other code...
}

This workaround worked perfectly : the bug symptoms disappeared, torBrowser shows the website perfectly.

The part missing is the exact explanation for (1). Tests show that the breakages are inconsistent : sometimes all requests go through correctly. So it looks like a subtle race condition between http.Server, httputil.ReverseProxy and http.Roundtrip

The text was updated successfully, but these errors were encountered:

rgeronimi · 2017-06-09T08:06:54Z

To pinpoint more precisely the origin of the bug and confirm the theory a second time, I created instead a clone library of httputil.ReverseProxy, commenting out line 146 of reverseproxy.go:

	if cn, ok := rw.(http.CloseNotifier); ok {
  		var cancel context.CancelFunc
  		ctx, cancel = context.WithCancel(ctx)
  		defer cancel()
  		notifyChan := cn.CloseNotify()
  		go func() {
  			select {
  			case <-notifyChan:
  				//TODO RACE cancel() <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
  			case <-ctx.Done():
  			}
  		}()
  	}

The race condition disappeared, the site works normally with torBrowser. So the race condition is indeed in the closing order between on one side the completion of the external request (from the external client) and on the other side the completion of the internal request (from the reverse proxy + transport):
in some cases the external request is deemed completed slightly before the internal request (from the reverse proxy and its transport) is deemed completed, triggering a full cancellation of all the other internal requests that were pipelined on the same internal connection.

That race condition is invisible for most browsers but becomes visible with torBrowser, maybe because it has very unusual delay patterns (roundtrips can take up to several seconds) in the overall request/response cycles.

willmo · 2017-06-11T00:33:17Z

Well the documentation for CloseNotify says:

        // If the protocol is HTTP/1.1 and CloseNotify is called while
        // processing an idempotent request (such a GET) while
        // HTTP/1.1 pipelining is in use, the arrival of a subsequent
        // pipelined request may cause a value to be sent on the
        // returned channel. In practice HTTP/1.1 pipelining is not
        // enabled in browsers and not seen often in the wild. If this
        // is a problem, use HTTP/2 or only use CloseNotify on methods
        // such as POST.

And from googling it looks like Tor Browser enables pipelining by default, and ReverseProxy doesn't check the method when using CloseNotify. So I think ReverseProxy is incompatible with HTTP/1.1 pipelining and therefore with Tor Browser (unless you disable pipelining in Tor Browser or make it use HTTP/2). But you could try to work around this in the Director function like you have.

bradfitz · 2017-06-11T02:01:34Z

Tor Browser enables pipelining by default? Ouch. Everybody else who's tried that has ended up disabling it. I thought we'd all concluded it was a failed experiment.

Go does support HTTP/1.1 pipelining in general, with the exception of the context cancellation caveat noted above. We considered that acceptable at the time because ~nobody uses pipelining in the wild.

I'm super curious to read the thread where Tor decided to that, if anybody has a link.

bradfitz · 2017-06-11T02:02:22Z

I'll keep this bug open, but I can't think of a good fix. Ideas welcome.

rgeronimi · 2017-06-11T09:04:09Z

Maybe a starting point: https://blog.torproject.org/blog/experimental-defense-website-traffic-fingerprinting Envoyé de mon iPhone

…

Le 11 juin 2017 à 04:02, Brad Fitzpatrick ***@***.***> a écrit : I'll keep this bug open, but I can't think of a good fix. Ideas welcome. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

rgeronimi · 2017-06-11T13:09:42Z

What are the negative side effects of resolving the CloseNotify caveat? Raphael

…

Le 11 juin 2017 à 04:02, Brad Fitzpatrick ***@***.***> a écrit : I'll keep this bug open, but I can't think of a good fix. Ideas welcome. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#20617 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AC4KehTfBcAP3J3KMwdKc64T5WF3sGN9ks5sC0rPgaJpZM4N0kji>.

bradfitz · 2017-06-12T21:27:14Z

What are the negative side effects of resolving the CloseNotify caveat?

The drug hasn't even been invented yet, so it's a bit early to be talking about negative side effects.

If you have a proposed fix, I'm all ears.

bradfitz · 2017-06-12T21:28:49Z

Oh, are you referring to your httputil.ReverseProxy change? That's the wrong place to make the change. That only affects the httputil package, but not anything else that uses net/http.(*Request).Context(). If you only cared about ReverseProxy, that might work.

rgeronimi · 2017-06-13T07:25:35Z

Oh sorry I didn't mean that, of course the httputil change was only a hack to confirm my pipelining issue theory.

I was thinking more about adding in CloseNotifier (or in a separate HalfCloseNotifier interface) this method:
HalfCloseNotify() <-chan bool
which returns a channel that receives at most 2 values : at most one false when the request stream is closed, at most one true when the response stream is closed.

An alternative is to address specifically only the response side with a ResponseCloseNotifier:
ResponseCloseNotify() <- chan struct{}

But that looks as complex and maybe less powerful for later.

Or maybe there are other ways to provide this distinction between request streams and response streams.

Then httputil.ReverseProxy, and maybe other http API clients, could use HalfCloseNotify or one of its alternatives together with the request idempotency to properly distinguish between request vs response closings.

But I am still new to that pipelining issue and more importantly to the exact set of API contracts satisfied today by the http lib so I may have missed something.

rgeronimi · 2017-06-13T12:09:13Z

My answer above was poor as I didn't check where exactly is the CloseNotifier interface implemented.
I see that its implemented by ResponseWriter.

In such case my first alternative (adding a HalfCloseNotify method) looks very abstract : it crosses the conceptual boundary of what the response writer should know and tell about by introducing a concept of "half-close" which wasnt triggered by the response stream itself but by specific request stream events. CloseNotifier already crossed the boundary (by including requests closings in its notification scope) but this would only reinforce it and make it more explicit.

The second alternative has a very confusing name: ResponseCloseNotifier for a ResponseWriter method. But we could call it FullCloseNotifier maybe

A third alternative is to extend the spec of the behavior of the CloseNotify() channel in the following backward-compatible way:
when specifically the response stream (and not the request stream) of the http roundtrip is closed, not only does the channel receive true, but it is also closed afterward.

As the doc says nothing about the closure of the channel, only about at most 1 value received, I guess this alternative satisfies the Go1 compatibility constraint.

That way reverseproxy can distinguish between request and response stream closing.
Althought it's cheaper in terms of API surface and heap allocations, i find this third alternative slightly arcane (in a negative way)

bradfitz · 2017-06-13T15:47:20Z

If this is fixed, the solution should not involve new API. It'd be nice to just only close the Request.Context at the right time. It would need to distinguish the background-reader byte read to see whether it's a real byte (e.g. 'G' for "GET") or io.EOF etc for the conn going away.

rgeronimi · 2017-06-13T16:20:31Z

Ah I didn't see the "may" in the CloseNotifier doc about the pipeline special case. I thought it was a "does" and thus that we were tied by Go1 backward compatibility contract.
In such case, yeah that shall be API-free

rgeronimi · 2017-06-13T16:53:10Z

Ah I just saw the connReader.closeNotifyFromPipelinedRequest method.
I see indeed there is quite some doc comments already on that topic :)
As written in the docs we dont need to go as far as parallelizing such requests, processing them sequentially would be largely sufficient for the specifics of torBrowser.

But couldn't we have that method, instead of calling connReader.closeNotify, wait on some kind of signal that the current request have been finished processing and just return? I dont know the inner workings of your signaling infrastructure there, do you have something, e.g. a sync.Cond under mu mutex supervision that you could wait on? A bit like Close vs Shutdown, where one is waiting for others to complete while the other is aggressively shutting things down.

bradfitz · 2017-06-13T17:02:00Z

Sorry, I'm working on getting the Go 1.9beta1 release out. I won't have time to work on this bug for a number of months. (This definitely won't be fixed in Go 1.9)

rgeronimi · 2017-06-13T17:03:12Z

For my use case is ok, good release !

willmo · 2017-06-15T01:33:58Z

Probably worth noting that Firefox (which Tor Browser is based on) has removed support for HTTP/1.1 pipelining in Firefox 54 (released today).

rgeronimi · 2017-10-02T14:52:58Z

Can't reproduce the bug with latest version of torBrowser (7.0.6). Might be due to random network fluctuations, or to them writing off pipelining. But couldn't find any announcement on that. Closing the ticket as far as what I could reproduce.

juliens · 2018-02-13T14:16:55Z

Do you think it's possible to reopen this issue ?

I see the same problem while using httputil.ReverseProxy for an apt mirror.

It'd be nice to just only close the Request.Context at the right time. It would need to distinguish the background-reader byte read to see whether it's a real byte (e.g. 'G' for "GET") or io.EOF etc for the conn going away.

@bradfitz I would be really happy to help on this, but I'm not sure to understand what you mean here.

davecheney · 2018-02-14T00:46:06Z

@juliens please open a new issue and include all the information requested by the issue template. A runnable code sample to demonstrate the issue would be extremely beneficial.

bradfitz · 2018-07-17T15:08:48Z

Fixed by 00ebfcd

rgeronimi changed the title ~~net/http: inappropriate context cancellation within httputil.ReversePropxy or http.Transport~~ net/http: inappropriate context cancellation within httputil.ReverseProxy or http.Transport Jun 8, 2017

rgeronimi changed the title ~~net/http: inappropriate context cancellation within httputil.ReverseProxy or http.Transport~~ net/http: inappropriate context cancellation within http.Server or http.Transport Jun 8, 2017

rgeronimi changed the title ~~net/http: inappropriate context cancellation within http.Server or http.Transport~~ net/http: inappropriate context cancellation between httputil.ReverseProxy and http.Roundtrip Jun 9, 2017

rgeronimi changed the title ~~net/http: inappropriate context cancellation between httputil.ReverseProxy and http.Roundtrip~~ net/http: context cancellation race condition between http.Server, httputil.ReverseProxy and http.Roundtrip Jun 9, 2017

rgeronimi changed the title ~~net/http: context cancellation race condition between http.Server, httputil.ReverseProxy and http.Roundtrip~~ net/http: context cancellation race condition between http.Server, httputil.ReverseProxy and http.Transport Jun 9, 2017

bradfitz added this to the Unplanned milestone Jun 11, 2017

bradfitz added the Thinking label Jun 11, 2017

bcmills mentioned this issue Aug 24, 2017

net/http: (*Transport).getConn traces through stale contexts #21597

Open

rgeronimi closed this as completed Oct 2, 2017

juliens mentioned this issue Feb 13, 2018

Traefik 500 errors and zero length http responses traefik/traefik#2593

Closed

labkode mentioned this issue Jul 17, 2018

net/http: context cancellation race condition between http.Server, httputil.ReverseProxy and http.Transport #26414

Closed

golang locked and limited conversation to collaborators Jul 17, 2019

gopherbot added the FrozenDueToAge label Jul 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

net/http: context cancellation race condition between http.Server, httputil.ReverseProxy and http.Transport #20617

net/http: context cancellation race condition between http.Server, httputil.ReverseProxy and http.Transport #20617

rgeronimi commented Jun 8, 2017 •

edited

Loading

rgeronimi commented Jun 9, 2017 •

edited

Loading

willmo commented Jun 11, 2017 •

edited

Loading

bradfitz commented Jun 11, 2017

bradfitz commented Jun 11, 2017

rgeronimi commented Jun 11, 2017 via email

rgeronimi commented Jun 11, 2017 via email

bradfitz commented Jun 12, 2017

bradfitz commented Jun 12, 2017

rgeronimi commented Jun 13, 2017 •

edited

Loading

rgeronimi commented Jun 13, 2017

bradfitz commented Jun 13, 2017

rgeronimi commented Jun 13, 2017 •

edited

Loading

rgeronimi commented Jun 13, 2017

bradfitz commented Jun 13, 2017

rgeronimi commented Jun 13, 2017

willmo commented Jun 15, 2017

rgeronimi commented Oct 2, 2017

juliens commented Feb 13, 2018

davecheney commented Feb 14, 2018

bradfitz commented Jul 17, 2018

net/http: context cancellation race condition between http.Server, httputil.ReverseProxy and http.Transport #20617

net/http: context cancellation race condition between http.Server, httputil.ReverseProxy and http.Transport #20617

Comments

rgeronimi commented Jun 8, 2017 • edited Loading

What version of Go are you using (go version)?

What operating system and processor architecture are you using (go env)?

What did you do?

What did you expect to see?

What did you see instead?

rgeronimi commented Jun 9, 2017 • edited Loading

willmo commented Jun 11, 2017 • edited Loading

bradfitz commented Jun 11, 2017

bradfitz commented Jun 11, 2017

rgeronimi commented Jun 11, 2017 via email

rgeronimi commented Jun 11, 2017 via email

bradfitz commented Jun 12, 2017

bradfitz commented Jun 12, 2017

rgeronimi commented Jun 13, 2017 • edited Loading

rgeronimi commented Jun 13, 2017

bradfitz commented Jun 13, 2017

rgeronimi commented Jun 13, 2017 • edited Loading

rgeronimi commented Jun 13, 2017

bradfitz commented Jun 13, 2017

rgeronimi commented Jun 13, 2017

willmo commented Jun 15, 2017

rgeronimi commented Oct 2, 2017

juliens commented Feb 13, 2018

davecheney commented Feb 14, 2018

bradfitz commented Jul 17, 2018

rgeronimi commented Jun 8, 2017 •

edited

Loading

What version of Go are you using (`go version`)?

What operating system and processor architecture are you using (`go env`)?

rgeronimi commented Jun 9, 2017 •

edited

Loading

willmo commented Jun 11, 2017 •

edited

Loading

rgeronimi commented Jun 13, 2017 •

edited

Loading

rgeronimi commented Jun 13, 2017 •

edited

Loading