-
Notifications
You must be signed in to change notification settings - Fork 18k
net/http: (*Transport).getConn traces through stale contexts #21597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Yep, that's technically a bug, but it seems basically unfixable given the current When we made Aside: #20617 might be related to #19643 but is unrelated to this bug. |
Per above, unpack the |
We can't do that, it will be a performance regression. But you're right, one way to "fix" the problem is to move all background tasks onto the critical path. I just don't think that's an acceptable fix. |
If Beyond that, as far as I can tell, it will add one allocation (for the child context), and then only if the |
Sorry, I misunderstood your suggestion. You're suggesting that we cancel the pending dial if an idle conn becomes available before the dial finishes (which we currently do not do ... this is the part I missed). If the pending dial is canceled, then I agree, waiting for a canceled dial to finish does not seem like a problem. That is still a semantics change that could affect performance. Currently, we let the pending dial finish and add it to the pool (up to Transport.MaxIdleConns and Transport.MaxIdleConnsPerHost). If we cancel the pending dial, that means a follow-up request will need to wait for a new dial or for a prior request to finish. This is potentially slower than the current implementation, where the follow-up request might use the pending dial. I am sure some usage pattern of http.Transport would be harmed by this change, and eventually we'd get a bug report. I consider it a bug that we dial using the request's ctx. Ideally we'd dial using a background ctx, but that depends on #19643, which is unlikely to be fixed any time soon. I think I have a good (Google-internal) project that could act as a benchmark for your suggestion. If there's no performance impact for that project, then your suggestion is likely a good solution. |
Moving to Go 1.11. |
(*http.Transport).getConn
currently starts adialConn
call in a background goroutine:go/src/net/http/transport.go
Lines 942 to 945 in 0b0cc41
That records traces to the provided
Context
and eventually invokest.DialContext
with it:go/src/net/http/transport.go
Line 1029 in 0b0cc41
go/src/net/http/transport.go
Line 1060 in 0b0cc41
This is pretty much a textbook illustration of the problem described in #19643 (Context API for continuing work). If
(*Transport).getConn
returns early (due to cancellation or to availability of an idle connection), the caller may have already written out the corresponding traces, anddialConn
(and/or the user-providedDialContext
callback) will unexpectedly access aContext
that the caller believes to be unreachable.httptrace.ClientTrace
says, "Functions may be called concurrently from different goroutines and some may be called after the request has completed or failed." However, that is not true ofContext
instances in general: if thehttp
package wants to save a trace after a call has returned, it should callValue
ahead of time and save only theClientTrace
pointer. IfdialConn
calls a user-providedDialContext
function, thengetConn
should cancel theContext
passed to it and wait forDialContext
to return before itself returning.See also #20617 (Context race in
http.Transport
).The text was updated successfully, but these errors were encountered: