Description
Please answer these questions before submitting your issue. Thanks!
This is a performance-related issue/proposal for the net/http2
library in https://github.com/golang/net/tree/master/http2.
What version of Go are you using (go version
)?
1.8beta2
What operating system and processor architecture are you using (go env
)?
amd64
I'm running "grpc-go" (https://github.com/grpc/grpc-go) micro-benchmarks (grpc-go uses only the "framer" from the net/http2
library). Specifically I'm looking at a benchmarks that tests "grpc streaming" throughput, with a couple of clients against one 32 core server.
Details on this benchmark setup: the server has a total of 64 tcp/http2 connections, with 100 long-lived http2 streams over each http2 connection. A streaming "round-trip" is a grpc request-response of about 10 bytes, each fitting into one data frame. The server is ran with a 5 second warmup and a 30 second benchmark, during which it does somewhere around 900K round trips per second.
After multiple changes that reduce memory allocations elsewhere, the memory "alloc_space" profile of the server after running this benchmark looks like:
2139.58MB of 2178.13MB total (98.23%)
Dropped 69 nodes (cum <= 10.89MB)
flat flat% sum% cum cum%
1436.57MB 65.95% 65.95% 1436.57MB 65.95% golang.org/x/net/http2.parseDataFrame
473.51MB 21.74% 87.69% 473.51MB 21.74% google.golang.org/grpc.protoCodec.Marshal
225.50MB 10.35% 98.05% 225.50MB 10.35% google.golang.org/grpc/transport.(*http2Server).handleData
2.50MB 0.11% 98.16% 494.02MB 22.68% google.golang.org/grpc/benchmark.(*testServer).StreamingCall
1MB 0.046% 98.21% 495.02MB 22.73% google.golang.org/grpc.(*Server).processStreamingRPC
0.50MB 0.023% 98.23% 477.01MB 21.90% google.golang.org/grpc.(*serverStream).SendMsg
0 0% 98.23% 1442.07MB 66.21% golang.org/x/net/http2.(*Framer).ReadFrame
0 0% 98.23% 1678.60MB 77.07% google.golang.org/grpc.(*Server).handleRawConn
0 0% 98.23% 495.02MB 22.73% google.golang.org/grpc.(*Server).handleStream
0 0% 98.23% 1677.60MB 77.02% google.golang.org/grpc.(*Server).serveHTTP2Transport
0 0% 98.23% 1676.57MB 76.97% google.golang.org/grpc.(*Server).serveStreams
0 0% 98.23% 495.02MB 22.73% google.golang.org/grpc.(*Server).serveStreams.func1.1
0 0% 98.23% 473.51MB 21.74% google.golang.org/grpc.(*protoCodec).Marshal
0 0% 98.23% 14.52MB 0.67% google.golang.org/grpc.(*serverStream).RecvMsg
0 0% 98.23% 473.51MB 21.74% google.golang.org/grpc.encode
0 0% 98.23% 14.52MB 0.67% google.golang.org/grpc.recv
0 0% 98.23% 14.52MB 0.67% google.golang.org/grpc/benchmark/grpc_testing.(*benchmarkServiceStreamingCallServer).RecvMsg
0 0% 98.23% 477.01MB 21.90% google.golang.org/grpc/benchmark/grpc_testing.(*benchmarkServiceStreamingCallServer).Send
0 0% 98.23% 494.02MB 22.68% google.golang.org/grpc/benchmark/grpc_testing._BenchmarkService_StreamingCall_Handler
0 0% 98.23% 1442.07MB 66.21% google.golang.org/grpc/transport.(*framer).readFrame
0 0% 98.23% 1676.57MB 76.97% google.golang.org/grpc/transport.(*http2Server).HandleStreams
0 0% 98.23% 2173.63MB 99.79% runtime.goexit
Note golang.org/x/net/http2.parseDataFrame
appears to be allocating a new DataFrame
struct per-data frame, which ends up being the largest source of allocations. (alloc appears to be from https://github.com/golang/net/blob/master/http2/frame.go#L577)
Also, from the total "alloc_objects" profile, golang.org/x/net/http2.parseDataFrame
appears to account for about 30M of the total ~80M object allocations.
Experimenting with a code change that repeatedly returns the same DataFrame
struct instead of creating new ones:
- memory allocations from
golang.org/x/net/http2.parseDataFrame
dissappear, - total memory allocated by the benchmark goes down ~1.5GB to about 740MB
- QPS in the grpc-microbenchmark increases about 5%
The current http2 framer returns a slice on "data frame reads" that's only valid until the next call to ReadFrame
. I'm wondering if similar semantics for the entire DataFrame
struct sound reasonable, or possibly an option to turn this on.
I can give more details on the benchmark setup if needed.
thanks