Skip to content

x/net/http2: reduce Framer ReadFrame allocations #18502

Closed
@apolcyn

Description

@apolcyn

Please answer these questions before submitting your issue. Thanks!

This is a performance-related issue/proposal for the net/http2 library in https://github.com/golang/net/tree/master/http2.

What version of Go are you using (go version)?

1.8beta2

What operating system and processor architecture are you using (go env)?

amd64

I'm running "grpc-go" (https://github.com/grpc/grpc-go) micro-benchmarks (grpc-go uses only the "framer" from the net/http2 library). Specifically I'm looking at a benchmarks that tests "grpc streaming" throughput, with a couple of clients against one 32 core server.

Details on this benchmark setup: the server has a total of 64 tcp/http2 connections, with 100 long-lived http2 streams over each http2 connection. A streaming "round-trip" is a grpc request-response of about 10 bytes, each fitting into one data frame. The server is ran with a 5 second warmup and a 30 second benchmark, during which it does somewhere around 900K round trips per second.

After multiple changes that reduce memory allocations elsewhere, the memory "alloc_space" profile of the server after running this benchmark looks like:

2139.58MB of 2178.13MB total (98.23%)
Dropped 69 nodes (cum <= 10.89MB)
      flat  flat%   sum%        cum   cum%
 1436.57MB 65.95% 65.95%  1436.57MB 65.95%  golang.org/x/net/http2.parseDataFrame
  473.51MB 21.74% 87.69%   473.51MB 21.74%  google.golang.org/grpc.protoCodec.Marshal
  225.50MB 10.35% 98.05%   225.50MB 10.35%  google.golang.org/grpc/transport.(*http2Server).handleData
    2.50MB  0.11% 98.16%   494.02MB 22.68%  google.golang.org/grpc/benchmark.(*testServer).StreamingCall
       1MB 0.046% 98.21%   495.02MB 22.73%  google.golang.org/grpc.(*Server).processStreamingRPC
    0.50MB 0.023% 98.23%   477.01MB 21.90%  google.golang.org/grpc.(*serverStream).SendMsg
         0     0% 98.23%  1442.07MB 66.21%  golang.org/x/net/http2.(*Framer).ReadFrame
         0     0% 98.23%  1678.60MB 77.07%  google.golang.org/grpc.(*Server).handleRawConn
         0     0% 98.23%   495.02MB 22.73%  google.golang.org/grpc.(*Server).handleStream
         0     0% 98.23%  1677.60MB 77.02%  google.golang.org/grpc.(*Server).serveHTTP2Transport
         0     0% 98.23%  1676.57MB 76.97%  google.golang.org/grpc.(*Server).serveStreams
         0     0% 98.23%   495.02MB 22.73%  google.golang.org/grpc.(*Server).serveStreams.func1.1
         0     0% 98.23%   473.51MB 21.74%  google.golang.org/grpc.(*protoCodec).Marshal
         0     0% 98.23%    14.52MB  0.67%  google.golang.org/grpc.(*serverStream).RecvMsg
         0     0% 98.23%   473.51MB 21.74%  google.golang.org/grpc.encode
         0     0% 98.23%    14.52MB  0.67%  google.golang.org/grpc.recv
         0     0% 98.23%    14.52MB  0.67%  google.golang.org/grpc/benchmark/grpc_testing.(*benchmarkServiceStreamingCallServer).RecvMsg
         0     0% 98.23%   477.01MB 21.90%  google.golang.org/grpc/benchmark/grpc_testing.(*benchmarkServiceStreamingCallServer).Send
         0     0% 98.23%   494.02MB 22.68%  google.golang.org/grpc/benchmark/grpc_testing._BenchmarkService_StreamingCall_Handler
         0     0% 98.23%  1442.07MB 66.21%  google.golang.org/grpc/transport.(*framer).readFrame
         0     0% 98.23%  1676.57MB 76.97%  google.golang.org/grpc/transport.(*http2Server).HandleStreams
         0     0% 98.23%  2173.63MB 99.79%  runtime.goexit

Note golang.org/x/net/http2.parseDataFrame appears to be allocating a new DataFrame struct per-data frame, which ends up being the largest source of allocations. (alloc appears to be from https://github.com/golang/net/blob/master/http2/frame.go#L577)

Also, from the total "alloc_objects" profile, golang.org/x/net/http2.parseDataFrame appears to account for about 30M of the total ~80M object allocations.

Experimenting with a code change that repeatedly returns the same DataFrame struct instead of creating new ones:

  • memory allocations from golang.org/x/net/http2.parseDataFrame dissappear,
  • total memory allocated by the benchmark goes down ~1.5GB to about 740MB
  • QPS in the grpc-microbenchmark increases about 5%

The current http2 framer returns a slice on "data frame reads" that's only valid until the next call to ReadFrame. I'm wondering if similar semantics for the entire DataFrame struct sound reasonable, or possibly an option to turn this on.

I can give more details on the benchmark setup if needed.

thanks

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions