-
Notifications
You must be signed in to change notification settings - Fork 18.7k
Description
What version of Go are you using (go version)?
$ go version go version go1.14.2 darwin/amd64
Does this issue reproduce with the latest release?
Yes
What's the issue
A hang was reported between a gRPC client (grpc-go v1.27.0) hitting a gRPC server in one of our production environments. The client and server are both running on the same host. I captured a core dump of the client and server code to analyze with delve. I noticed that the google.golang.org/grpc/internal/transport.loopyWriter.cbuf.sendQuota was 0 in the client code, which indicates that the client's connection-level send window had run out and was at 0. In the server's core dump, I tracked down the corresponding http2.serverConn and noticed that it's serverConn.inflow.n was set to 0 too. I then tracked down the two places in http2/server.go that call inflow.take and noticed what I believe is the issue in processData:
func (sc *serverConn) processData(f *DataFrame) error {
...
if f.Length > 0 {
// Check whether the client has flow control quota.
if st.inflow.available() < int32(f.Length) {
return streamError(id, ErrCodeFlowControl)
}
st.inflow.take(int32(f.Length))
if len(data) > 0 {
wrote, err := st.body.Write(data)
if err != nil {
return streamError(id, ErrCodeStreamClosed)
}
if wrote != len(data) {
panic("internal error: bad Writer")
}
st.bodyBytes += int64(len(data))
}
// Return any padded flow control now, since we won't
// refund it later on body reads.
if pad := int32(f.Length) - int32(len(data)); pad > 0 {
sc.sendWindowUpdate32(nil, pad)
sc.sendWindowUpdate32(st, pad)
}
}
...
In this code, st.inflow.take is called, but if st.body.Write returns an error then the flow control is not refunded to the client since the code bails and returns a streamError (nor is it added to the st.body's pipeBuffer since pipe.Write returns immediately if it has an error to return).
Side note: st.body.Write may return an error if st.body.Close is called. The server which had this issue is using grpc-go's serverHandlerTransport which does, in fact, call req.Body.Close (see here). A gRPC bi-directional streaming endpoint is running between the client and server, and what i suspect is happening is the client is sending the server data over the bi-di stream while an error happens in the gRPC server that causes the request to end, and therefore req.Body.Close to be called while data is in flight.
Here's what I think a possible fix to net/http2 could look like:
diff --git a/http2/server.go b/http2/server.go
index 01f4ecc..ba3ebd1 100644
--- a/http2/server.go
+++ b/http2/server.go
@@ -1650,6 +1650,7 @@ func (sc *serverConn) processData(f *DataFrame) error {
if len(data) > 0 {
wrote, err := st.body.Write(data)
if err != nil {
+ sc.sendWindowUpdate32(nil, int32(f.Length))
return streamError(id, ErrCodeStreamClosed)
}
if wrote != len(data) {