-
-
Notifications
You must be signed in to change notification settings - Fork 4.3k
Some proxied POST requests fail or hang when upstream does not consume body #782
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi, thanks for the report. Can you please build Caddy with the Also, there is not enough details here to reproduce the error. Can you please provide the upstream server's source code so that we can run it and see the same results? As simple as possible would be good. (Nvm, I see the link now.) |
Sure, the output for |
Thanks for the build information! Okay, so the JS has an error on line 20: Your upstream server is sending a 413 to all POST requests which is why you're getting the 413 error. |
Sorry, my mistake! I guess I didn't catch it because requests never finished. The issue isn't about it returning 413s, but rather not returning anything at all, or sending anything to the upstream server for that matter. I have no idea what's causing it. |
Hmm, just a note for anyone who looks to work on this: it seems that this can only be reproduced on an HTTPS connection as I have tried HTTP and it works fine. |
This seems to have to do with the fact that it's a multipart-form request, for some reason. POSTing an empty request works fine. @nemothekid have you experienced this before? I can't quite figure out what's going on. |
I was able to reproduce this pretty reliably without going https (in FireFox). (Stream of thought here...) diff --git a/c.go b/b.go
index 417aea3..bd1d613 100644
--- a/c.go
+++ b/b.go
@@ -10,6 +10,7 @@ import (
func handler(w http.ResponseWriter, r *http.Request) {
if r.Method == "POST" {
log.Println("Received post request")
+ io.Copy(ioutil.Discard, r.Body)
http.Error(w, http.StatusText(http.StatusRequestEntityTooLarge), http.StatusRequestEntityTooLarge)
return
} Reading the entire body, before closing the request causes everything to go smoothly. This isn't a "fix" because the upstream server should be able to close the connection anytime it likes, regardless of the fact that Caddy hasn't finished sending the request to it. It looks like Caddy doesn't like that fact that the connection is being closed before all the data is sent and is hanging. My guess is (didn't debug to find out) https://github.com/mholt/caddy/blob/master/middleware/proxy/reverseproxy.go#L205 returns a Tangentially, I was checking out the reverseproxy.go in golang (where Caddy's reverseproxy.go is copied from) and saw they are using a CloseNotifier. I was wondering why that is, but it seems like this issue is the case. Looking into tl;dr reverseproxy.go has to detect that the backend server has decided to close the connection and decide how to communicate that back to the client. |
My hypothesis on copying the "requestcanceler" code didn't work. |
Scratch that. I need to test my solutions more thoroughly before committing. |
Looks as if golang/go#14061 might be related. Derived from https://groups.google.com/forum/#!topic/golang-nuts/ZVfwwWttVYI |
@nemothekid Very helpful -- thanks for looking into this! I also was wondering if the upstream was misbehaving but as you said, that shouldn't matter. Hmm, this is tricky! Update: Ignore pretty much this whole post, I made a mistake in my testing (below). Ah, and it's interesting that Firefox has a slightly different behavior. Still sporadic, but definitely different from Chrome, as at least one connection seems to succeed regularly: Indeed some connections still hang. Update: While writing this post, I am no longer able to reproduce the behavior locally. I'm serving over a test domain over HTTPS with a real certificate, hosted on my local computer. I'm accessing the endpoint exposed by Caddy, added some extra log.Print lines to see more of what is going on, and even tried messing with the payload: first removing the body entirely, then adding the multipart stuff, then re-enabling the content (the 5 MB of zeroes). All I did was step away for a few hours, came back (both the servers were running the whole time, but I closed the computer), and it was working. I then restarted the servers and things are still working. Firefox and Chrome both show the % values going up until the upload is complete, after which they show a 413 error as expected. The only changes I made were adding a few log lines. Strange, some? |
Oh wait. @nemothekid It's working for me now because, in adding logging to the upstream instance to verify how much of the response is being read, I'm consuming the response body with I will disable that logging for future debugging. So far, your conclusion still seems correct: the upstream needs to be allowed to close the connection intentionally without Caddy marking the upstream as down. |
I was running my tests without setting
times like a bajillion. It essentially tries requests infinitely for a minute, as Caddy is programmed to do when it's told to never assume a backend goes down. And each time it adds my IP address again to the header. That's gotta increase memory usage. Whoops. The errors that are being reported from reverseproxy.go are the likes of: ^ This behavior of multiplying that header value only happens sometimes; I may have to refresh the page to trigger it; other times, it just runs the requests once. |
(I don't know why it fails under Firefox, http/1.1) AFAICT, It looks like a data race. I'm checking out a simpler case where When |
Speaking of things happening a bajillion times, I noted that when logging the error return from RoundTrip, it would be |
@nemothekid That is exactly what I was experiencing too. I found that it would do that for the first request (just the headers and no body) and from that point onward all requests would be void and just hang. This can also be seen on the sample in which the very first upload gets more done than any of the subsequent uploads.
|
^ Now we're getting somewhere. That is exactly consistent with my tests too. |
@nemothekid With regards to a possible data race:
Note that my line numbers might be slightly different since I've added a few log lines. proxy.go:139 says But I imagine this is unrelated to the issue at hand. (Still, maybe the read in upstream.go should be an atomic operation instead.) I also want to amend what I said earlier about Firefox acting differently. It might act slightly differently but it seems just as sporadic as Chrome does. |
Looks like a bug in golang's http2 library. My guess is if in a http2 server, a request's Body is closed before it is fully read, for some reason the responsewriter's body is discarded. |
That's what I thought too, but I still see similar behavior with http2 disabled (run caddy with |
That's odd - what would be the difference between a http 1.1 TLS request and a unencrypted one? With Chain
I'm trying to tie this back to why |
When using http2 and When using http/1.1, at least the upstream sees 6 requests (remember, I refresh the page, so this adds up, 3 requests per page load). When using http/2, it kind of varies how many requests the upstream actually sees. This is getting weird... |
While debugging #782, I noticed that using http2 and max_fails=0, X-Forwarded-For grew infinitely when an upstream request failed after refreshing the test page. This change ensures that headers are only set once per request rather than appending in a time-terminated loop.
Also, I know that Nimi acknowledged this, but for those following along at home, I want to re-emphasize that this whole scenario works just fine over HTTP; this problem only seems manifest when using HTTPS. (And I promise I'm not draining the request body upstream like I was accidentally doing before.) Over plain HTTP I consistently see
as one would expect. So, for some reason, maybe HTTPS is the culprit somehow? EDIT: Only in Chrome. Firefox still has the same behavior even with plain HTTP. |
So it could potentially be an issue with the TLS? Interesting. |
I just amended my last comment, apparently Firefox still exhibits the behavior. Chrome doesn't though! Maybe we should drop to Here's a working curl command:
It's a smaller payload but it seems to work for testing. So far I'm seeing successes across the board with curl. (Also successes with |
Likewise, all requests to any previous versions of the server work just fine, including one I wrote in node quickly. This might be a browser issue, or more specifically an issue with the way http2 handles multiple connections. However, the issue is still seen on HTTP/1.1 so I'm really not sure. |
I believe one half of this issue is caused by a bug in golang/go#15425. Note in my bug report, I can reproduce the issue without even using caddy just the original script @6f7262 provided. However, this really only explains why it doesn't work on http/2, the other failures are interesting.. |
Whoops. Misclick. I agree, all of the behaviors in this are bizarre. I also noticed that it's not even about proxying anymore, but rather just about Go's responses. |
According to golang/go#15425 (comment) this looks like an issue with the JavaScript provided, not Go or Caddy. @6f7262 confirm? |
Completely false @nemothekid |
@nemothekid I do agree this is less and less becoming a bug with Caddy. |
I think that endlessly pinging the backend server upon fail is a bad idea however. Rather than using a time window, maybe a certain amount of tries would be better? |
@6f7262 This only happens if you set
So we could do that, although I don't consider the current behavior a "bug". |
I didn't say it was a bug, I'm just saying that it could potentially make situations far worse if it's getting hammered for 60 seconds straight per request if it doesn't respond. I think Nimi's suggestion would work fairly well. Should it be configurable though? As in, retry a max of 4 times in 60 seconds or something. |
That's the equivalent of setting In fact the "fix" I suggested is "equivalent" (it's racy) to
|
I see. The reason I was using |
…#784) * Move handling of headers around to prevent memory use spikes While debugging #782, I noticed that using http2 and max_fails=0, X-Forwarded-For grew infinitely when an upstream request failed after refreshing the test page. This change ensures that headers are only set once per request rather than appending in a time-terminated loop. * Refactor some code into its own function
Looks like this is an issue in Chrome. An nginx developer posted a similar bug report to Chromium (https://bugs.chromium.org/p/chromium/issues/detail?id=603182), thats similar to this issue. nginx decided to write a patch in their http2 code, however I think the equivalent would be forking net/http2 in go to make this work. |
Oops, I didn't realize my PR that doesn't fix this issue closed the issue anyway 😄 I guess GitHub got a little excited there... @nemothekid Thank you for your extensive work on this. I'll need to decide if a fork of net/http is something we want to maintain... thing is, I bet Chrome can roll out a fix through their automatic updates more quickly than we can get people to update their web servers. |
Did we miss anything here in this issue, @nemothekid? Your issue on the Go repo was closed and looks like a fix made it into 1.7. |
Uh oh!
There was an error while loading. Please reload this page.
1. What version of Caddy are you running (
caddy -version
)?Caddy (untracked dev build)
2. What are you trying to do?
Upload files to a reverse proxy
3. What is your entire Caddyfile?
4. How did you run Caddy (give the full command and describe the execution environment)?
$ sudo caddy -email="[email protected]" -agree
5. What did you expect to see?
Proxied requests and responses
6. What did you see instead (give full error messages and/or log)?
Requests were either not sent to the proxy server and responses were only partially sent (just the headers). Usually requests either end up hanging on the client side and never reach the proxy or the responses return the headers with a protocol exception (
net::ERR_SPDY_PROTOCOL_ERROR
)This occurs when the proxy returns a status code outside the range of 200-399 despite having
max_fails 0
set to prevent the backend from being marked as down. Even if this were the case, then this would not be expected behaviorNo error messages are reported by Caddy when adding
-log stdout
to the command or anywhere else.Code to reproduce (use the network inspector to view details)
The text was updated successfully, but these errors were encountered: