-
Notifications
You must be signed in to change notification settings - Fork 18k
net/http: request body transfer optimization issues (ReadFrom/sendfile) #30377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Change https://golang.org/cl/163599 mentions this issue: |
Change https://golang.org/cl/163737 mentions this issue: |
The performance issues may actually be largely related to the HTTP2 stack. Over the weekend when I was testing this, I was not accounting for TLS (as can be seen from the tests above). This morning while working on trying to see if the changes I made helped the real-world problems we've been seeing, I saw no change, and then realized that none of these performance gains really apply to TLS on part of On debugging again, I discovered that This may or may not be a major priority for us in light of this, it would be neat to see what is up in the HTTP2 stack that might be causing this and I'd like to investigate. Some advice on whether or not I should open a new issue for the HTTP2 package once it's discovered would be good though. |
Change https://golang.org/cl/163862 mentions this issue: |
An update on this: I've been working on benchmarks (see referenced PR/CL). It does appear that when using a TLS test server that requests are properly sent through the HTTP1 transport, so that particular problem seems to be some sort or implementation issue between what happens when use Running the referenced benchmarks, I am definitely seeing performance improvements on both unencrypted and TLS connections. An interesting oddity is the memory jump on unencrypted connections on
|
@odeke-em contributed improved benchmarks in https://golang.org/cl/163862 and that CL is being updated. Thanks again Emmanuel! I'm posting the new results below. I'm not too sure what's up with my MacBook right now, the same odd skew in the pre-update results for non-TLS vs. TLS tests (where non-TLS is actually slower than with TLS) is happening on my old benchmarks now too. 🤷♂️ Anyway:
|
This adds benchmarks to test file uploads using PUT requests. It's designed to complement changes https://golang.org/cl/163599 and https://golang.org/cl/163737, allowing an easy comparison of performance before and after these changes are applied. Updates #30377. Co-authored-by: Emmanuel Odeke <[email protected]> Change-Id: Ib8e692c61e1f7957d88c7101669d4f7fb8110c65 GitHub-Last-Rev: 242622b GitHub-Pull-Request: #30424 Reviewed-on: https://go-review.googlesource.com/c/go/+/163862 Run-TryBot: Emmanuel Odeke <[email protected]> Reviewed-by: Emmanuel Odeke <[email protected]>
Make persistConnWriter implement io.ReaderFrom, via an io.Copy on the underlying net.Conn. This in turn enables it to use OS level optimizations such as sendfile. This has been observed giving performance gains even in the absence of ReaderFrom, more than likely due to the difference in io's default buffer (32 KB) versus bufio's (4 KB). Speedups on linux/amd64: benchmark old MB/s new MB/s speedup BenchmarkFileAndServer_16MB/NoTLS-4 662.96 2703.74 4.08x BenchmarkFileAndServer_16MB/TLS-4 552.76 1420.72 2.57x Speedups on darwin/amd64: benchmark old MB/s new MB/s speedup BenchmarkFileAndServer_16MB/NoTLS-8 357.58 1972.86 5.52x BenchmarkFileAndServer_16MB/TLS-8 346.20 1067.41 3.08x Updates #30377. Change-Id: Ic88d4ac254f665223536fcba4d551fc32ae105b6 GitHub-Last-Rev: a6f67cd GitHub-Pull-Request: #30390 Reviewed-on: https://go-review.googlesource.com/c/go/+/163737 Run-TryBot: Emmanuel Odeke <[email protected]> TryBot-Result: Gobot Gobot <[email protected]> Reviewed-by: Emmanuel Odeke <[email protected]>
@vancluever please remember to use https://godoc.org/golang.org/x/perf/cmd/benchstat when comparing benchmark results; that will give results with p-values that we can trust better than single runs. You'll also need to take multiple benchmark measurements before and after, for example via |
net.TCPConn has the ability to send data out using system calls such as sendfile when the source data comes from an *os.File. However, the way that I/O has been laid out in the transport means that the File is actually wrapped behind two outer io.Readers, and as such the TCP stack cannot properly type-assert the reader, ensuring that it falls back to genericReadFrom. This commit does the following: * Removes transferBodyReader and moves its functionality to a new doBodyCopy helper. This is not an io.Reader implementation, but no functionality is lost this way, and it allows us to unwrap one layer from the body. * The second layer of the body is unwrapped if the original reader was wrapped with ioutil.NopCloser, which is what NewRequest wraps the body in if it's not a ReadCloser on its own. The unwrap operation passes through the existing body if there's no nopCloser. Note that this depends on change https://golang.org/cl/163737 to properly function, as the lack of ReaderFrom implementation otherwise means that this functionality is essentially walled off. Benchmarks between this commit and https://golang.org/cl/163862, incorporating https://golang.org/cl/163737: linux/amd64: name old time/op new time/op delta FileAndServer_1KB/NoTLS-4 53.2µs ± 0% 53.3µs ± 0% ~ (p=0.075 n=10+9) FileAndServer_1KB/TLS-4 61.2µs ± 0% 60.7µs ± 0% -0.77% (p=0.000 n=10+9) FileAndServer_16MB/NoTLS-4 25.3ms ± 5% 3.8ms ± 6% -84.95% (p=0.000 n=10+10) FileAndServer_16MB/TLS-4 33.2ms ± 2% 13.4ms ± 2% -59.57% (p=0.000 n=10+10) FileAndServer_64MB/NoTLS-4 106ms ± 4% 16ms ± 2% -84.45% (p=0.000 n=10+10) FileAndServer_64MB/TLS-4 129ms ± 1% 54ms ± 3% -58.32% (p=0.000 n=8+10) name old speed new speed delta FileAndServer_1KB/NoTLS-4 19.2MB/s ± 0% 19.2MB/s ± 0% ~ (p=0.095 n=10+9) FileAndServer_1KB/TLS-4 16.7MB/s ± 0% 16.9MB/s ± 0% +0.78% (p=0.000 n=10+9) FileAndServer_16MB/NoTLS-4 664MB/s ± 5% 4415MB/s ± 6% +565.27% (p=0.000 n=10+10) FileAndServer_16MB/TLS-4 505MB/s ± 2% 1250MB/s ± 2% +147.32% (p=0.000 n=10+10) FileAndServer_64MB/NoTLS-4 636MB/s ± 4% 4090MB/s ± 2% +542.81% (p=0.000 n=10+10) FileAndServer_64MB/TLS-4 522MB/s ± 1% 1251MB/s ± 3% +139.95% (p=0.000 n=8+10) darwin/amd64: name old time/op new time/op delta FileAndServer_1KB/NoTLS-8 93.0µs ± 5% 96.6µs ±11% ~ (p=0.190 n=10+10) FileAndServer_1KB/TLS-8 105µs ± 7% 100µs ± 5% -5.14% (p=0.002 n=10+9) FileAndServer_16MB/NoTLS-8 87.5ms ±19% 10.0ms ± 6% -88.57% (p=0.000 n=10+10) FileAndServer_16MB/TLS-8 52.7ms ±11% 17.4ms ± 5% -66.92% (p=0.000 n=10+10) FileAndServer_64MB/NoTLS-8 363ms ±54% 39ms ± 7% -89.24% (p=0.000 n=10+10) FileAndServer_64MB/TLS-8 209ms ±13% 73ms ± 5% -65.37% (p=0.000 n=9+10) name old speed new speed delta FileAndServer_1KB/NoTLS-8 11.0MB/s ± 5% 10.6MB/s ±10% ~ (p=0.184 n=10+10) FileAndServer_1KB/TLS-8 9.75MB/s ± 7% 10.27MB/s ± 5% +5.26% (p=0.003 n=10+9) FileAndServer_16MB/NoTLS-8 194MB/s ±16% 1680MB/s ± 6% +767.83% (p=0.000 n=10+10) FileAndServer_16MB/TLS-8 319MB/s ±10% 963MB/s ± 4% +201.36% (p=0.000 n=10+10) FileAndServer_64MB/NoTLS-8 180MB/s ±31% 1719MB/s ± 7% +853.61% (p=0.000 n=9+10) FileAndServer_64MB/TLS-8 321MB/s ±12% 926MB/s ± 5% +188.24% (p=0.000 n=9+10) Updates #30377. Change-Id: I631a73cea75371dfbb418c9cd487c4aa35e73fcd GitHub-Last-Rev: 4a77dd1 GitHub-Pull-Request: #30378 Reviewed-on: https://go-review.googlesource.com/c/go/+/163599 Run-TryBot: Emmanuel Odeke <[email protected]> TryBot-Result: Gobot Gobot <[email protected]> Reviewed-by: Emmanuel Odeke <[email protected]>
Yes @mvdan I also paged him about using -count=10, thanks for mentioning it too. Alright, all the respective CLs have been submitted, @vancluever I think we can now close this issue, thank you very much for working on this! |
@mvdan as @odeke-em mentioned, updated stats are posted to https://golang.org/cl/163599 in summary form, but I've posted another run that has memory stats below as well.
Thanks again @odeke-em and everyone else for the help in all of this! |
PS: It doesn't look like #30424 closed out correctly after the merge of the benchmarks. I can close it out unless it's needed for any reason, or if automation will reap it later. |
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes (1.11.5), haven't tried the 1.12 RC, but the implementations of
persistConnWriter
andtransferWriter.writeBody
do not appear to have changed, which is where the issue lies.What operating system and processor architecture are you using (
go env
)?What did you do?
Upload a file using
PUT
in a http.Request, by setting the file as Body:(Not that our implementations at HashiCorp usually wrap in go-retryablehttp, which is interface-compatible with the above. This is what the benchmarks below use. The above example is to illustrate the repro using the stdlib.)
This test utility was tested on live systems sending uploads to live services.
What did you expect to see?
An upload speed matching something more representative of the speed of the links the systems were connected to.
fasthttp
, which was also tested, gave upload speeds of about 50-70 MB/sec on live systems depending on the location of the client machine.What did you see instead?
Upload speeds ranging from about 8MB/sec (lower latency systems) to as low as sub-1MB/sec (higher-latency systems).
Additional info/root cause
Investigation into the internals of the
net/http
transport and its behavior compared tofasthttp
revealed that the pathfasthttp
currently takes allowed it to usesendfile
to transfer the data instead ofwrite
calls. This was discovered by using various tracing tools (strace
/bpftrace
) during testing.Further investigation into the
net/http
transport revealed that its writer is currently wrapped inpersistConnWriter
which does not implementio.ReaderFrom
, which is what is necessary for anio.Copy
to the transport's connection (ie:TCPConn
) to ultimately fast-path tosendfile
. See here. Further to that, the request body is wrapped in atransferBodyReader
, and possibly anioutil.nopCloser
, obfuscating the reader to the point whereTCPConn
cannot discern properly whether or not the underlying reader is an*os.File
.Adding a trivial implementation for
ReadFrom
inpersistConnWriter
allows for the pathing tosendfile
, in addition to significantly speeding up uploads when the reader is not an*os.File
, or if the OS does not have asendfile
implementation (example:darwin
), due to the use ofgenericReadFrom
. ReplacingtransferBodyReader
with a non-writer implementation - and providing additional unwrapping withinwriteBody
to unwrap the underlying reader from thenopCloser
if the wrapping exists - ensures that the proper reader makes its way toTCPConn.sendFile
.Example on a local machine using a simple
net/http
server reading the body into anioutil.Discard
, uploading a 1GB file of random data:The text was updated successfully, but these errors were encountered: