Skip to content

x/net/http2: connection-level flow control not returned if stream errors, causes server hang #40423

@jared2501

Description

@jared2501

What version of Go are you using (go version)?

$ go version
go version go1.14.2 darwin/amd64

Does this issue reproduce with the latest release?

Yes

What's the issue

A hang was reported between a gRPC client (grpc-go v1.27.0) hitting a gRPC server in one of our production environments. The client and server are both running on the same host. I captured a core dump of the client and server code to analyze with delve. I noticed that the google.golang.org/grpc/internal/transport.loopyWriter.cbuf.sendQuota was 0 in the client code, which indicates that the client's connection-level send window had run out and was at 0. In the server's core dump, I tracked down the corresponding http2.serverConn and noticed that it's serverConn.inflow.n was set to 0 too. I then tracked down the two places in http2/server.go that call inflow.take and noticed what I believe is the issue in processData:

func (sc *serverConn) processData(f *DataFrame) error {
	...

	if f.Length > 0 {
		// Check whether the client has flow control quota.
		if st.inflow.available() < int32(f.Length) {
			return streamError(id, ErrCodeFlowControl)
		}
		st.inflow.take(int32(f.Length))

		if len(data) > 0 {
			wrote, err := st.body.Write(data)
			if err != nil {
				return streamError(id, ErrCodeStreamClosed)
			}
			if wrote != len(data) {
				panic("internal error: bad Writer")
			}
			st.bodyBytes += int64(len(data))
		}

		// Return any padded flow control now, since we won't
		// refund it later on body reads.
		if pad := int32(f.Length) - int32(len(data)); pad > 0 {
			sc.sendWindowUpdate32(nil, pad)
			sc.sendWindowUpdate32(st, pad)
		}
	}
...

In this code, st.inflow.take is called, but if st.body.Write returns an error then the flow control is not refunded to the client since the code bails and returns a streamError (nor is it added to the st.body's pipeBuffer since pipe.Write returns immediately if it has an error to return).

Side note: st.body.Write may return an error if st.body.Close is called. The server which had this issue is using grpc-go's serverHandlerTransport which does, in fact, call req.Body.Close (see here). A gRPC bi-directional streaming endpoint is running between the client and server, and what i suspect is happening is the client is sending the server data over the bi-di stream while an error happens in the gRPC server that causes the request to end, and therefore req.Body.Close to be called while data is in flight.

Here's what I think a possible fix to net/http2 could look like:

diff --git a/http2/server.go b/http2/server.go
index 01f4ecc..ba3ebd1 100644
--- a/http2/server.go
+++ b/http2/server.go
@@ -1650,6 +1650,7 @@ func (sc *serverConn) processData(f *DataFrame) error {
                if len(data) > 0 {
                        wrote, err := st.body.Write(data)
                        if err != nil {
+                               sc.sendWindowUpdate32(nil, int32(f.Length))
                                return streamError(id, ErrCodeStreamClosed)
                        }
                        if wrote != len(data) {

Metadata

Metadata

Assignees

No one assigned

    Labels

    FrozenDueToAgeNeedsFixThe path to resolution is known, but the work has not been done.

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions