-
Notifications
You must be signed in to change notification settings - Fork 18k
net/http: Client does not retry idempotent requests on transport failure #4677
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
See also issue #3514 (a race between client re-using connections and the server shutting down that connection). Fixing this would kinda fix that, if it were safe to do this. I don't know what Chrome does. |
Relevant Chrome code, I think: "1264 // This method determines whether it is safe to resend the request after an 1265 // IO error. It can only be called in response to request header or body 1266 // write errors or response header read errors. It should not be used in 1267 // other cases, such as a Connect error." http://git.chromium.org/gitweb/?p=chromium.git;a=blob;f=net/http/http_network_transaction.cc;h=caa8f11909ff40e5cee9dd6be3814233220d36f1;hb=HEAD#l1264 |
And: 1342 bool HttpNetworkTransaction::ShouldResendRequest(int error) const { 1343 bool connection_is_proven = stream_->IsConnectionReused(); 1344 bool has_received_headers = GetResponseHeaders() != NULL; 1345 1346 // NOTE: we resend a request only if we reused a keep-alive connection. 1347 // This automatically prevents an infinite resend loop because we'll run 1348 // out of the cached keep-alive connections eventually. 1349 if (connection_is_proven && !has_received_headers) 1350 return true; 1351 return false; 1352 } |
Comment 7 by [email protected]: * If we get a RST at the TCP level when we try to reuse a persistent connection, we retry (because middleboxes often time out connections and send RSTs if you still try to use it). For desktop, you can be wasteful and use TCP keepalives to mitigate this too. For mobile, it wakes up the radio and maybe costs money, so you shouldn't do that and just deal with it. We close persistent connections after a fixed period to minimize the times we get these errors. For mobile, we do deferred socket closes (wait for the radio to wakeup due to other HTTP requests, and then close all timed out sockets). * If you pipeline requests and get a transport error, we pray that HEADs and GETs are actually idempotent and retry. |
FWIW we just came across an issue that triggered this problem in a way that happens reliably. The problem happens when GOMAXPROCS=1 and there's something doing lots of work (in our case it was reading a bunch of data from disk and gzipping it) and then issues an http request. The connection times out while the code is spinning, but the http transport code doesn't get scheduled to see the EOF until the code stops spinning, so we see the EOF almost exactly at the same time as the request is issued, so the request fails because it doesn't realise that the EOF is out of band. The problem is exacerbated because the window for the race is larger than it needs to be (the request is marked as in flight before the request has been transferred to the writing goroutine, instead of incrementing the count just before conn.Write is called). Here's a program that demonstrates the issue reliably for me: http://play.golang.org/p/bVf9wsCJSx The second GET request fails with "can't write HTTP request on broken connection" or "EOF". It succeeds when run with GOMAXPROCS>0. For the time being we can work around this avoiding connection reuse on all our http client connections, but this is less than ideal. |
Issue #8122 has been merged into this issue. |
Issue #9122 has been merged into this issue. |
If we try to reuse a connection that the server is in the process of closing, we may end up successfully writing out our request (or a portion of our request) only to find a connection error when we try to read from (or finish writing to) the socket. This manifests as an EOF returned from the Transport's RoundTrip. This is a test for one of the issues described in issue golang#4677.
If we try to reuse a connection that the server is in the process of closing, we may end up successfully writing out our request (or a portion of our request) only to find a connection error when we try to read from (or finish writing to) the socket. This manifests as an EOF returned from the Transport's RoundTrip. This is a test for one of the issues described in issue golang#4677.
If we try to reuse a connection that the server is in the process of closing, we may end up successfully writing out our request (or a portion of our request) only to find a connection error when we try to read from (or finish writing to) the socket. This manifests as an EOF returned from the Transport's RoundTrip. The issue, among others, is described in golang#4677. This change follows some of the Chromium guidelines for retrying idempotent requests only when the connection has been already been used successfully and no header data has yet been received for the response. Change-Id: I1ca630b944f0ed7ec1d3d46056a50fb959481a16
I've made an attempt to resolve one specific, reproducible example of this issue: https://go-review.googlesource.com/3210 Following the comments in this thread from Chromium's network stack, I'm only retrying under the following circumstances:
The comments so far suggest that this kind of change is something you're open to. Hopefully my approach is reasonable :) |
https://go-review.googlesource.com/#/c/3210 is out for review |
CL https://golang.org/cl/3210 mentions this issue. |
If we try to reuse a connection that the server is in the process of closing, we may end up successfully writing out our request (or a portion of our request) only to find a connection error when we try to read from (or finish writing to) the socket. This manifests as an EOF returned from the Transport's RoundTrip. The issue, among others, is described in #4677. This change follows some of the Chromium guidelines for retrying idempotent requests only when the connection has been already been used successfully and no header data has yet been received for the response. As part of this change, an unexported error was defined for errMissingHost, which was previously defined inline. errMissingHost is the only non-network error returned from a Request's Write() method. Additionally, this breaks TestLinuxSendfile because its test server explicitly triggers the type of scenario this change is meant to retry on. Because that test server stops accepting conns on the test listener before the retry, the test would time out. To fix this, the test was altered to use a non-idempotent test type (POST). Change-Id: I1ca630b944f0ed7ec1d3d46056a50fb959481a16 Reviewed-on: https://go-review.googlesource.com/3210 Reviewed-by: Brad Fitzpatrick <[email protected]> Run-TryBot: Brad Fitzpatrick <[email protected]>
Should this issue be closed now that 5dd372b is merged? Or does that only cover some of the cases of this issue? |
@glasser, yes, probably. I can't remember anything else this was open for. |
by patrick.allen.higgins:
The text was updated successfully, but these errors were encountered: