Skip to content

Some proxied POST requests fail or hang when upstream does not consume body #782

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
uhthomas opened this issue Apr 22, 2016 · 39 comments · Fixed by #784
Closed

Some proxied POST requests fail or hang when upstream does not consume body #782

uhthomas opened this issue Apr 22, 2016 · 39 comments · Fixed by #784

Comments

@uhthomas
Copy link

uhthomas commented Apr 22, 2016

1. What version of Caddy are you running (caddy -version)?

Caddy (untracked dev build)

2. What are you trying to do?

Upload files to a reverse proxy

3. What is your entire Caddyfile?

site.com {
  proxy / 127.0.0.1:1333 {
    max_fails 0
  }
}

4. How did you run Caddy (give the full command and describe the execution environment)?

$ sudo caddy -email="[email protected]" -agree

5. What did you expect to see?

Proxied requests and responses

6. What did you see instead (give full error messages and/or log)?

Requests were either not sent to the proxy server and responses were only partially sent (just the headers). Usually requests either end up hanging on the client side and never reach the proxy or the responses return the headers with a protocol exception (net::ERR_SPDY_PROTOCOL_ERROR)

This occurs when the proxy returns a status code outside the range of 200-399 despite having max_fails 0 set to prevent the backend from being marked as down. Even if this were the case, then this would not be expected behavior

No error messages are reported by Caddy when adding -log stdout to the command or anywhere else.

ss 2016-04-22 at 08 43 13

Code to reproduce (use the network inspector to view details)
@mholt
Copy link
Member

mholt commented Apr 22, 2016

Hi, thanks for the report.

Can you please build Caddy with the build.bash script and then re-run the -version flag? We need to know precise build info. Please ensure it is the same one with which you're encountering the problem.

Also, there is not enough details here to reproduce the error. Can you please provide the upstream server's source code so that we can run it and see the same results? As simple as possible would be good.

(Nvm, I see the link now.)

@uhthomas
Copy link
Author

uhthomas commented Apr 22, 2016

Sure, the output for $ caddy -version is Caddy 0.8.2 (+3c578df Fri Apr 22 21:58:13 UTC 2016)

@mholt
Copy link
Member

mholt commented Apr 22, 2016

Thanks for the build information!

Okay, so the JS has an error on line 20: Uncaught ReferenceError: res is not defined -- when I changed it to req the error went away.

Your upstream server is sending a 413 to all POST requests which is why you're getting the 413 error.

@uhthomas
Copy link
Author

Sorry, my mistake! I guess I didn't catch it because requests never finished. The issue isn't about it returning 413s, but rather not returning anything at all, or sending anything to the upstream server for that matter. I have no idea what's causing it.

@mholt
Copy link
Member

mholt commented Apr 22, 2016

Hmm, just a note for anyone who looks to work on this: it seems that this can only be reproduced on an HTTPS connection as I have tried HTTP and it works fine.

@mholt
Copy link
Member

mholt commented Apr 23, 2016

This seems to have to do with the fact that it's a multipart-form request, for some reason. POSTing an empty request works fine. @nemothekid have you experienced this before? I can't quite figure out what's going on.

@nemosupremo
Copy link

nemosupremo commented Apr 23, 2016

I was able to reproduce this pretty reliably without going https (in FireFox).

(Stream of thought here...)
Adding this to the backend server "fixes" the issue.

diff --git a/c.go b/b.go
index 417aea3..bd1d613 100644
--- a/c.go
+++ b/b.go
@@ -10,6 +10,7 @@ import (
 func handler(w http.ResponseWriter, r *http.Request) {
        if r.Method == "POST" {
                log.Println("Received post request")
+               io.Copy(ioutil.Discard, r.Body)
                http.Error(w, http.StatusText(http.StatusRequestEntityTooLarge), http.StatusRequestEntityTooLarge)
                return
        }

Reading the entire body, before closing the request causes everything to go smoothly. This isn't a "fix" because the upstream server should be able to close the connection anytime it likes, regardless of the fact that Caddy hasn't finished sending the request to it. It looks like Caddy doesn't like that fact that the connection is being closed before all the data is sent and is hanging.

My guess is (didn't debug to find out) https://github.com/mholt/caddy/blob/master/middleware/proxy/reverseproxy.go#L205 returns a ErrConnectionClosed or something like that, and we see that error and decide to mark the backend as down.

Tangentially, I was checking out the reverseproxy.go in golang (where Caddy's reverseproxy.go is copied from) and saw they are using a CloseNotifier. I was wondering why that is, but it seems like this issue is the case.

Looking into requestCanceler, the docs say transport.CancelRequest is deprecated as it can't be used on http/2 connection so I'm not sure how useful copying the code, as is, is. Secondly I'm not sure is CancelRequest will give us an error on RoundTrip @ https://github.com/mholt/caddy/blob/master/middleware/proxy/reverseproxy.go#L205 leading us back to square one.

tl;dr reverseproxy.go has to detect that the backend server has decided to close the connection and decide how to communicate that back to the client.

@nemosupremo
Copy link

My hypothesis on copying the "requestcanceler" code didn't work.

@uhthomas
Copy link
Author

uhthomas commented Apr 23, 2016

Changing the round tripper to a http client straight up worked fine for me. Not sure if there are any disadvantages to this though.

Scratch that. I need to test my solutions more thoroughly before committing.

@uhthomas
Copy link
Author

uhthomas commented Apr 23, 2016

Looks as if golang/go#14061 might be related.

Derived from https://groups.google.com/forum/#!topic/golang-nuts/ZVfwwWttVYI

@mholt
Copy link
Member

mholt commented Apr 23, 2016

@nemothekid Very helpful -- thanks for looking into this! I also was wondering if the upstream was misbehaving but as you said, that shouldn't matter. Hmm, this is tricky!

Update: Ignore pretty much this whole post, I made a mistake in my testing (below).

Ah, and it's interesting that Firefox has a slightly different behavior. Still sporadic, but definitely different from Chrome, as at least one connection seems to succeed regularly:

screen shot 2016-04-22 at 10 20 48 pm

Indeed some connections still hang.

Update: While writing this post, I am no longer able to reproduce the behavior locally. I'm serving over a test domain over HTTPS with a real certificate, hosted on my local computer. I'm accessing the endpoint exposed by Caddy, added some extra log.Print lines to see more of what is going on, and even tried messing with the payload: first removing the body entirely, then adding the multipart stuff, then re-enabling the content (the 5 MB of zeroes).

All I did was step away for a few hours, came back (both the servers were running the whole time, but I closed the computer), and it was working. I then restarted the servers and things are still working. Firefox and Chrome both show the % values going up until the upload is complete, after which they show a 413 error as expected.

screen shot 2016-04-22 at 10 42 30 pm

The only changes I made were adding a few log lines. Strange, some?

@mholt
Copy link
Member

mholt commented Apr 23, 2016

Oh wait.

@nemothekid It's working for me now because, in adding logging to the upstream instance to verify how much of the response is being read, I'm consuming the response body with ioutil.ReadAll, which is the same as the workaround you suggested. * facepalm *

I will disable that logging for future debugging.

So far, your conclusion still seems correct: the upstream needs to be allowed to close the connection intentionally without Caddy marking the upstream as down.

@mholt
Copy link
Member

mholt commented Apr 23, 2016

I was running my tests without setting max_fails 0 like in @6f7262's instructions. When I did use that, I have discovered by outputting the outreq that the value of the X-Forwarded-For header very quickly becomes:

"X-Forwarded-For":[]string{"66.<my IP>..", 66.<my IP>..", 66.<my IP>..", 66.<my IP>..", 66.<my IP>..", 66.<my IP>..", 66.<my IP>..", 66.<my IP>..", 66.<my IP>..", 66.<my IP>..", 66.<my IP>..", 66.<my IP>..", 66.<my IP>..", 66.<my IP>..", 66.<my IP>..", 66.<my IP>..", 66.<my IP>..", 66.<my IP>..", 66.<my IP>..", 66.<my IP>..", 66.<my IP>..", 66.<my IP>..", 66.<my IP>..", 66.<my IP>..", 66.<my IP>..", 66.<my IP>..", 66.<my IP>..", 66.<my IP>..", 66.<my IP>..", 66.<my IP>..", 66.<my IP>.."

times like a bajillion.

It essentially tries requests infinitely for a minute, as Caddy is programmed to do when it's told to never assume a backend goes down. And each time it adds my IP address again to the header.

That's gotta increase memory usage. Whoops.

The errors that are being reported from reverseproxy.go are the likes of: Failed: stream error: stream ID 5; CANCEL

^ This behavior of multiplying that header value only happens sometimes; I may have to refresh the page to trigger it; other times, it just runs the requests once.

@nemosupremo
Copy link

(I don't know why it fails under Firefox, http/1.1)

AFAICT, It looks like a data race. I'm checking out a simpler case where var reqs = 1; in the JS code as that doesn't properly work either and is less complex than trying to debug 3 requests at once.

When var reqs = 1, the status code (413) is retrieved, but the body (which should be "Request Entity Too Large") never seems to be sent to the browser.

@uhthomas
Copy link
Author

Speaking of things happening a bajillion times, I noted that when logging the error return from RoundTrip, it would be stream error: stream ID 5; CANCEL over and over, not sure if this is related.

@uhthomas
Copy link
Author

uhthomas commented Apr 23, 2016

@nemothekid That is exactly what I was experiencing too. I found that it would do that for the first request (just the headers and no body) and from that point onward all requests would be void and just hang.

This can also be seen on the sample in which the very first upload gets more done than any of the subsequent uploads.

Sending 3 requests...
1.2510978767231062%
0.05439555985752635%
0.05439555985752635%

@mholt
Copy link
Member

mholt commented Apr 23, 2016

^ Now we're getting somewhere. That is exactly consistent with my tests too.

@mholt mholt added the bug 🐞 Something isn't working label Apr 23, 2016
@mholt mholt changed the title POST requests via reverse proxy fail Some proxied POST requests fail or hang when upstream does not consume body Apr 23, 2016
@mholt
Copy link
Member

mholt commented Apr 23, 2016

@nemothekid With regards to a possible data race:

==================
WARNING: DATA RACE
Write by goroutine 71:
  sync/atomic.AddInt32()
      /usr/local/go/src/runtime/race_amd64.s:255 +0xb
  github.com/mholt/caddy/middleware/proxy.Proxy.ServeHTTP.func1()
      /Users/matt/Dev/src/github.com/mholt/caddy/middleware/proxy/proxy.go:139 +0x50

Previous read by goroutine 108:
  github.com/mholt/caddy/middleware/proxy.(*staticUpstream).NewHost.func1.1()
      /Users/matt/Dev/src/github.com/mholt/caddy/middleware/proxy/upstream.go:111 +0x6b
  github.com/mholt/caddy/middleware/proxy.(*UpstreamHost).Down()
      /Users/matt/Dev/src/github.com/mholt/caddy/middleware/proxy/proxy.go:59 +0xb2
  github.com/mholt/caddy/middleware/proxy.(*UpstreamHost).Available()
      /Users/matt/Dev/src/github.com/mholt/caddy/middleware/proxy/proxy.go:69 +0x32
  github.com/mholt/caddy/middleware/proxy.(*staticUpstream).Select()
      /Users/matt/Dev/src/github.com/mholt/caddy/middleware/proxy/upstream.go:244 +0xa8
  github.com/mholt/caddy/middleware/proxy.Proxy.ServeHTTP()
      /Users/matt/Dev/src/github.com/mholt/caddy/middleware/proxy/proxy.go:88 +0x460
  github.com/mholt/caddy/middleware/proxy.(*Proxy).ServeHTTP()
      <autogenerated>:5 +0xf2
  github.com/mholt/caddy/server.(*Server).ServeHTTP()
      /Users/matt/Dev/src/github.com/mholt/caddy/server/server.go:359 +0x596
  net/http.serverHandler.ServeHTTP()
      /usr/local/go/src/net/http/server.go:2081 +0x206
  net/http.initNPNRequest.ServeHTTP()
      /usr/local/go/src/net/http/server.go:2489 +0x350
  net/http.(*initNPNRequest).ServeHTTP()
      <autogenerated>:253 +0xda
  net/http.(Handler).ServeHTTP-fm()
      /usr/local/go/src/net/http/h2_bundle.go:3847 +0x5d
  net/http.(*http2serverConn).runHandler()
      /usr/local/go/src/net/http/h2_bundle.go:4060 +0xac

Goroutine 71 (running) created at:
  github.com/mholt/caddy/middleware/proxy.Proxy.ServeHTTP()
      /Users/matt/Dev/src/github.com/mholt/caddy/middleware/proxy/proxy.go:140 +0xdf6
  github.com/mholt/caddy/middleware/proxy.(*Proxy).ServeHTTP()
      <autogenerated>:5 +0xf2
  github.com/mholt/caddy/server.(*Server).ServeHTTP()
      /Users/matt/Dev/src/github.com/mholt/caddy/server/server.go:359 +0x596
  net/http.serverHandler.ServeHTTP()
      /usr/local/go/src/net/http/server.go:2081 +0x206
  net/http.initNPNRequest.ServeHTTP()
      /usr/local/go/src/net/http/server.go:2489 +0x350
  net/http.(*initNPNRequest).ServeHTTP()
      <autogenerated>:253 +0xda
  net/http.(Handler).ServeHTTP-fm()
      /usr/local/go/src/net/http/h2_bundle.go:3847 +0x5d
  net/http.(*http2serverConn).runHandler()
      /usr/local/go/src/net/http/h2_bundle.go:4060 +0xac

Goroutine 108 (running) created at:
  net/http.(*http2serverConn).processHeaderBlockFragment()
      /usr/local/go/src/net/http/h2_bundle.go:3853 +0x7c2
  net/http.(*http2serverConn).processHeaders()
      /usr/local/go/src/net/http/h2_bundle.go:3788 +0xb75
  net/http.(*http2serverConn).processFrame()
      /usr/local/go/src/net/http/h2_bundle.go:3515 +0x5fc
  net/http.(*http2serverConn).processFrameFromReader()
      /usr/local/go/src/net/http/h2_bundle.go:3474 +0xb85
  net/http.(*http2serverConn).serve()
      /usr/local/go/src/net/http/h2_bundle.go:3169 +0xd46
  net/http.(*http2Server).ServeConn()
      /usr/local/go/src/net/http/h2_bundle.go:2711 +0x101f
  net/http.http2ConfigureServer.func1()
      /usr/local/go/src/net/http/h2_bundle.go:2603 +0x115
  net/http.(*conn).serve()
      /usr/local/go/src/net/http/server.go:1414 +0xa32
==================

Note that my line numbers might be slightly different since I've added a few log lines. proxy.go:139 says atomic.AddInt32(&host.Fails, -1) - upstream.go:111 reads if uh.Fails >= u.MaxFails &&

But I imagine this is unrelated to the issue at hand. (Still, maybe the read in upstream.go should be an atomic operation instead.)

I also want to amend what I said earlier about Firefox acting differently. It might act slightly differently but it seems just as sporadic as Chrome does.

@nemosupremo
Copy link

Looks like a bug in golang's http2 library.

My guess is if in a http2 server, a request's Body is closed before it is fully read, for some reason the responsewriter's body is discarded.

@mholt
Copy link
Member

mholt commented Apr 23, 2016

That's what I thought too, but I still see similar behavior with http2 disabled (run caddy with -http2=false).

@nemosupremo
Copy link

That's odd - what would be the difference between a http 1.1 TLS request and a unencrypted one?

With var req = 1; and http/2 disabled, I see the 413 error and the response body. I believe this bug explains why we don't see data.

Chain

  1. The upstream server sends a response without reading the request body.
  2. transport.go:1285 Closes the body (not sure what the error returned here is)
  3. The request tries to write to the body in h2_bundle.go:3880 (and fails)
  4. As a result of this failure, the stream is closed in h2_bundle.go:3667. Because this http2 stream is closed, any attempt to write back to response fails.

I'm trying to tie this back to why var req = 3 fails and why it fails in http/1.1. My only guess now is that net/http was written with the assumption that the request would always be fully read?

@mholt
Copy link
Member

mholt commented Apr 23, 2016

When using http2 and var reqs = 3 (in the JS) and refreshing the page once, Caddy tries the request infinitely many times over and over again, but no infinite looping happens using http/1.1.

When using http/1.1, at least the upstream sees 6 requests (remember, I refresh the page, so this adds up, 3 requests per page load). When using http/2, it kind of varies how many requests the upstream actually sees.

This is getting weird...

mholt added a commit that referenced this issue Apr 23, 2016
While debugging #782, I noticed that using http2 and max_fails=0,
X-Forwarded-For grew infinitely when an upstream request failed after
refreshing the test page. This change ensures that headers are only
set once per request rather than appending in a time-terminated loop.
@mholt
Copy link
Member

mholt commented Apr 23, 2016

Also, I know that Nimi acknowledged this, but for those following along at home, I want to re-emphasize that this whole scenario works just fine over HTTP; this problem only seems manifest when using HTTPS. (And I promise I'm not draining the request body upstream like I was accidentally doing before.)

Over plain HTTP I consistently see

Sending 3 requests...
Done. Got 413
Done. Got 413
Done. Got 413

as one would expect. So, for some reason, maybe HTTPS is the culprit somehow?

EDIT: Only in Chrome. Firefox still has the same behavior even with plain HTTP.

@uhthomas
Copy link
Author

So it could potentially be an issue with the TLS? Interesting.

@mholt
Copy link
Member

mholt commented Apr 23, 2016

I just amended my last comment, apparently Firefox still exhibits the behavior. Chrome doesn't though!

Maybe we should drop to curl for our tests.

Here's a working curl command:

curl -v --insecure --data "$(printf '0%.0s' {1..50000})" https://caddy.6f.io

It's a smaller payload but it seems to work for testing. So far I'm seeing successes across the board with curl. (Also successes with --http2.)

@uhthomas
Copy link
Author

uhthomas commented Apr 23, 2016

Likewise, all requests to any previous versions of the server work just fine, including one I wrote in node quickly. This might be a browser issue, or more specifically an issue with the way http2 handles multiple connections. However, the issue is still seen on HTTP/1.1 so I'm really not sure.

@nemosupremo
Copy link

I believe one half of this issue is caused by a bug in golang/go#15425. Note in my bug report, I can reproduce the issue without even using caddy just the original script @6f7262 provided.

However, this really only explains why it doesn't work on http/2, the other failures are interesting..

@uhthomas uhthomas reopened this Apr 24, 2016
@uhthomas
Copy link
Author

Whoops. Misclick.

I agree, all of the behaviors in this are bizarre. I also noticed that it's not even about proxying anymore, but rather just about Go's responses.

@nemosupremo
Copy link

According to golang/go#15425 (comment) this looks like an issue with the JavaScript provided, not Go or Caddy.

@6f7262 confirm?

@uhthomas
Copy link
Author

Completely false @nemothekid

@mholt
Copy link
Member

mholt commented Apr 25, 2016

@nemothekid I do agree this is less and less becoming a bug with Caddy.

@mholt mholt removed the bug 🐞 Something isn't working label Apr 25, 2016
@uhthomas
Copy link
Author

I think that endlessly pinging the backend server upon fail is a bad idea however. Rather than using a time window, maybe a certain amount of tries would be better?

@mholt
Copy link
Member

mholt commented Apr 25, 2016

@6f7262 This only happens if you set max_fails 0, because you're telling Caddy, "Even if there is a failure communicating with upstream, it is never down." But Nimi had a good idea in another issue (#754), he suggested:

The default behavior is to try backends for 60 seconds (essentially infinite loop for 60 seconds). If max_tries is set to 0, it might make most sense to not bother with retrying connections and just return GatewayUnavailable once the backend fails.

So we could do that, although I don't consider the current behavior a "bug".

@uhthomas
Copy link
Author

I didn't say it was a bug, I'm just saying that it could potentially make situations far worse if it's getting hammered for 60 seconds straight per request if it doesn't respond. I think Nimi's suggestion would work fairly well. Should it be configurable though? As in, retry a max of 4 times in 60 seconds or something.

@nemosupremo
Copy link

As in, retry a max of 4 times in 60 seconds or something.

That's the equivalent of setting max_fails to 4. When max_fails is 0, it means to mark the backend as never down, and as long as there exists non-down backends, Caddy will keep trying for 60s.

In fact the "fix" I suggested is "equivalent" (it's racy) to

max_fails 1
fail_duration 1ms

@uhthomas
Copy link
Author

I see. The reason I was using max_fails 0 is because even if the backend did return an error, then I didn't want it to be marked as down for 60 seconds. The 1ms idea seems a little tight still, maybe include a none option? Just to note, I encountered these errors because of this http2 bug which is kinda how I got here.

mholt added a commit that referenced this issue Apr 27, 2016
…#784)

* Move handling of headers around to prevent memory use spikes

While debugging #782, I noticed that using http2 and max_fails=0,
X-Forwarded-For grew infinitely when an upstream request failed after
refreshing the test page. This change ensures that headers are only
set once per request rather than appending in a time-terminated loop.

* Refactor some code into its own function
@nemosupremo
Copy link

Looks like this is an issue in Chrome. An nginx developer posted a similar bug report to Chromium (https://bugs.chromium.org/p/chromium/issues/detail?id=603182), thats similar to this issue.

nginx decided to write a patch in their http2 code, however I think the equivalent would be forking net/http2 in go to make this work.

@mholt
Copy link
Member

mholt commented Apr 29, 2016

Oops, I didn't realize my PR that doesn't fix this issue closed the issue anyway 😄 I guess GitHub got a little excited there...

@nemothekid Thank you for your extensive work on this. I'll need to decide if a fork of net/http is something we want to maintain... thing is, I bet Chrome can roll out a fix through their automatic updates more quickly than we can get people to update their web servers.

@mholt
Copy link
Member

mholt commented Aug 6, 2016

Did we miss anything here in this issue, @nemothekid? Your issue on the Go repo was closed and looks like a fix made it into 1.7.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants