-
Notifications
You must be signed in to change notification settings - Fork 18k
net/http: context cancellation can leave HTTP client with deadlocked HTTP/1.1 connections in Go1.22 #65705
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Digging into this further, it looks like an accounting bug triggered by 38a0c5b. In particular, the early return in ctx := w.getCtxForDial()
if ctx == nil {
return
} However, there is a decrement lower down, which seems to indicate that the contract of this function was supposed to be to call if err != nil {
t.decConnsPerHost(w.key)
} Certainly, the following patch seems to solve the problem: diff --git a/src/net/http/transport.go b/src/net/http/transport.go
index 2a549a9576..411f6b2912 100644
--- a/src/net/http/transport.go
+++ b/src/net/http/transport.go
@@ -1478,6 +1478,7 @@ func (t *Transport) dialConnFor(w *wantConn) {
defer w.afterDial()
ctx := w.getCtxForDial()
if ctx == nil {
+ t.decConnsPerHost(w.key)
return
}
|
I've added a unit-test that triggers this problem, along with the patch I mentioned in my previous comment, and opened https://go-review.googlesource.com/c/go/+/564036 |
Change https://go.dev/cl/564036 mentions this issue: |
(attn @neild) |
Thanks for the nice analysis (and the fix!). @gopherbot please open backport issue for go1.21. This is significant regression with no workaround. |
Backport issue(s) opened: #65759 (for 1.21). Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://go.dev/wiki/MinorReleases. |
Thanks @neild, but I think you mean backport for 1.22! |
Let's update to the latest version of Go 1.21 to get the latest security updates. We're not using Go 1.22 because it has a bug (golang/go#65705) that can cause problems for HTTP clients in some situations.
Thank you for catching this! Saved me a lot of time debugging... |
Change https://go.dev/cl/566536 mentions this issue: |
A recent change to Transport.dialConnFor introduced an early return that skipped dialing. This path did not call decConnsPerHost, which can cause subsequent HTTP calls to hang if Transport.MaxConnsPerHost is set. For #65705 Fixes #65759 Change-Id: I157591114b02a3a66488d3ead7f1e6dbd374a41c Reviewed-on: https://go-review.googlesource.com/c/go/+/564036 Reviewed-by: Damien Neil <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]> Auto-Submit: Damien Neil <[email protected]> Reviewed-by: Than McIntosh <[email protected]> (cherry picked from commit 098a87f) Reviewed-on: https://go-review.googlesource.com/c/go/+/566536 Reviewed-by: Carlos Amedee <[email protected]>
Go1.22.0 has the bug golang/go#65705 , which prevents vmagent from normal operation.
Go1.22.0 has the bug golang/go#65705 , which prevents vmagent from normal operation.
Go1.22.0 contains the bug golang/go#65705 , which prevents vmagent from normal operation.
Go1.22.0 contains the bug golang/go#65705 , which prevents vmagent from normal operation.
Due to the issue noted in golang/go#65705, we don't want to build our release assets with 1.22.0, since this can cause deadlock in the HTTP client. Instead, let's keep testing with both 1.21 and 1.22 in our CI jobs, but build with 1.21 for release. In the future, once that bug is fixed and an updated version is released, we can move back to 1.22.
A recent change to Transport.dialConnFor introduced an early return that skipped dialing. This path did not call decConnsPerHost, which can cause subsequent HTTP calls to hang if Transport.MaxConnsPerHost is set. For golang#65705 Fixes golang#65759 Change-Id: I157591114b02a3a66488d3ead7f1e6dbd374a41c Reviewed-on: https://go-review.googlesource.com/c/go/+/564036 Reviewed-by: Damien Neil <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]> Auto-Submit: Damien Neil <[email protected]> Reviewed-by: Than McIntosh <[email protected]> (cherry picked from commit 098a87f) Reviewed-on: https://go-review.googlesource.com/c/go/+/566536 Reviewed-by: Carlos Amedee <[email protected]>
A recent change to Transport.dialConnFor introduced an early return that skipped dialing. This path did not call decConnsPerHost, which can cause subsequent HTTP calls to hang if Transport.MaxConnsPerHost is set. For golang#65705 Fixes golang#65759 Change-Id: I157591114b02a3a66488d3ead7f1e6dbd374a41c Reviewed-on: https://go-review.googlesource.com/c/go/+/564036 Reviewed-by: Damien Neil <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]> Auto-Submit: Damien Neil <[email protected]> Reviewed-by: Than McIntosh <[email protected]> (cherry picked from commit 098a87f) Reviewed-on: https://go-review.googlesource.com/c/go/+/566536 Reviewed-by: Carlos Amedee <[email protected]>
Now that Go 1.22.1 is out with a backport to fix golang/go#65705, revert the change that forced us to build with Go 1.21 to avoid the problem. This reverts commit ba2889b.
Now that the Go 1.22.1 has been released with a backported fix (golang/go@16830ab, per golang/go#65759) for the problem reported in golang/go#65705, we can upgrade to Go 1.22 to align with our CI and release workflows in the main Git LFS repository.
Go1.22.0 contains the bug golang/go#65705 , which prevents vmagent from normal operation.
Go1.22.0 contains the bug golang/go#65705 , which prevents vmagent from normal operation.
Go version
go version go1.22.0 darwin/arm64
Output of
go env
in your module/workspace:GO111MODULE=''
GOARCH='arm64'
GOBIN=''
GOCACHE='/Users/tibbes/Library/Caches/go-build'
GOENV='/Users/tibbes/Library/Application Support/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='arm64'
GOHOSTOS='darwin'
GOINSECURE=''
GOMODCACHE='/Users/tibbes/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='darwin'
GOPATH='/Users/tibbes/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/usr/local/go'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/usr/local/go/pkg/tool/darwin_arm64'
GOVCS=''
GOVERSION='go1.22.0'
GCCGO='gccgo'
AR='ar'
CC='clang'
CXX='clang++'
CGO_ENABLED='1'
GOMOD='/dev/null'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -ffile-prefix-map=/var/folders/18/_76dkntd3f76rb3fql9q5jrw0000gn/T/go-build1573940289=/tmp/go-build -gno-record-gcc-switches -fno-common'
What did you do?
There seems to be a bug in go 1.22.0 (not present in earlier versions), that can leave an HTTP client deadlocked when
MaxConnsPerHost
is used.The following program reproduces the problem. It:
MaxConnsPerHost
,MaxIdleConns
, andMaxIdleConnsPerHost
all set to 1What did you see happen?
With go 1.22.0, the HTTP client is not usable after stressing it with short timeouts:
What did you expect to see?
I would expect to see the same behaviour as go 1.21.7:
The text was updated successfully, but these errors were encountered: