-
Notifications
You must be signed in to change notification settings - Fork 4.9k
Reduce allocations when parsing response headers in WinHttpHandler #3199
Reduce allocations when parsing response headers in WinHttpHandler #3199
Conversation
cc: @CIPop @SidharthNabar @stephentoub I'm in the process of some serious code re-arranging as I am working on #3000. So, I'm not sure if this PR will be handled now or later. |
I do want to comment on this aspect of the PR:
WinHttpHandler does not parse out the http version, status code and reason phrase itself using WINHTTP_QUERY_RAW_HEADERS. We let WinHTTP do that for us and ask for those things separately via other WinHTTP header query APIs. WinHTTP already does the work of parsing that so there is no point in doing it ourselves. We only use WINHTTP_QUERY_RAW_HEADERS to get the headers themselves in order to parse that into our strongly typed header collections. So, I'm not sure of the value of putting that particular parsing logic into Common since WinHttpHandler won't be using it. |
I meant to call this out specifically. This is to avoid the allocations associated with retrieving the version and reason phrase |
Thinking about it more, if we want to avoid parsing the status line ourselves (even though it looks like we have to do it for CURL) and get the version and reason phrase from WinHTTP, we could do it in a way that avoids the two intermediate
|
6b8fe65
to
ddca900
Compare
I removed the status line parsing from |
ddca900
to
09bbc40
Compare
@davidsh @justinvp While the description makes sense, I think the ordering should be:
+1 on this PR method :) |
@justinvp Do you think you can refresh this PR so that we can consider it? Thx. |
19e6a9a
to
3656036
Compare
@davidsh I've updated the PR, but for some reason CI doesn't run when I force push the changes after rebasing. I'm going to close and reopen the PR to see if that'll kick off a CI build. |
3656036
to
7c5d0e3
Compare
@stephentoub PTAL |
@stephentoub Do you have any further comment on this revision? Thx. |
Sorry, David, I lost track of this... will review later today. |
return TryGet(SecWebSocketExtensions, key, startIndex, length, out name); // Sec-WebSocket-Extensions | ||
} | ||
|
||
// When adding a new constant, add it to HttpKnownHeaderNames.cs as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: this comment is repeated from above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I intentionally added the same comment at both the top and bottom of the file. Based on your feedback, I went ahead and removed the comment at the bottom. One comment at the top should be sufficient.
reasonPhrase = new string(buffer, 0, reasonPhraseLength); | ||
} | ||
|
||
response.ReasonPhrase = reasonPhrase; // It's OK if this is null. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's OK if this is null
In what case would it be null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the comment (it was a holdover from an earlier iteration that I should have been deleted).
Overall this looks good, but some comments/questions. |
I added a new commit that addresses the review feedback as well as another commit that tweaks one of the tests slightly. (I will squash after the review). @stephentoub, I converted the Try methods to non-Try except for
I temporarily instrumented using (var client = new HttpClient())
{
var response = client.GetAsync("http://www.bing.com").GetAwaiter().GetResult();
} Here are the results:
|
@stephentoub Do you have an final comments on this PR? |
LGTM. Thanks, @justinvp. Note that while the GC total memory before/after is interesting, it doesn't really reflect the savings from the PR, as you'd expect the GC to be running during the test and cleaning up rather than just letting things leak, plus other things contribute. I was more interested in understanding the frequency and amount of allocations happening during the test. I just ran a similar test: using (HttpClient client = new HttpClient())
{
for (int i = 0; i < 500; i++)
{
client.GetAsync("http://httpbin.org/ip").GetAwaiter().GetResult().Dispose();
}
} |
LGTM |
…aders Reduce allocations when parsing response headers in WinHttpHandler
Thanks guys. @stephentoub, Ah, the profiler data is much better (thanks). Note: the allocation reductions aren't limited to |
Recent changes due to PR dotnet#3199 caused `HttpClient` on Windows to miss `Set-Cookie` response headers when one of the header values was large enough to cause `WinHttpQueryHeaders` to fail with `ERROR_INSUFFICIENT_BUFFER`. This commit addresses the regression. Added a new test to validate this change. Fixes #6737.
Recent changes due to PR dotnet#3199 caused `HttpClient` on Windows to miss `Set-Cookie` response headers when one of the header values was large enough to cause `WinHttpQueryHeaders` to fail with `ERROR_INSUFFICIENT_BUFFER` and advance the index. This commit addresses the regression, by always setting the index back to the index we want to retrieve. Added a new test to validate this change. Fixes #6737.
Recent changes due to PR dotnet#3199 caused `HttpClient` on Windows to miss `Set-Cookie` response headers when one of the header values was large enough to cause `WinHttpQueryHeaders` to fail with `ERROR_INSUFFICIENT_BUFFER` and advance the index. This commit addresses the regression, by always setting the index back to the index we want to retrieve. Added a new test to validate this change. Fixes #6737.
// This buffer is the length needed for WINHTTP_QUERY_RAW_HEADERS_CRLF, which includes the status line | ||
// and all headers separated by CRLF, so it should be large enough for any individual status line or header queries. | ||
int bufferLength = GetResponseHeaderCharBufferLength(requestHandle, Interop.WinHttp.WINHTTP_QUERY_RAW_HEADERS_CRLF); | ||
char[] buffer = new char[bufferLength]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@justinvp @davidsh @stephentoub I believe this is now allocating twice (or even 4x) the required buffer, for each request:
- 2x from the following code path:
GetResponseHeaderCharBufferLength
is calling intoWinHttpQueryHeaders
which returns the buffer size in bytes not C#char
which is 16bit wide. - another 2x from double buffering instead of lazily extracting headers as they are requested by the application.
The last point is debatable and would need to be profiled (which, if I have time, I will do part of my perf. work). Lacking the necessary telemetry it's hard to understand if most apps really care about the headers or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you look at the code before this PR you'll see that it was worse before... Previously it queried WINHTTP_QUERY_RAW_HEADERS_CRLF
along with a few other headers. Each time it did so a StringBuilder
would be allocated (along with underlying char[]
buffer), then there would be copying going on back and forth as part of the interop marshaling, and then a call to ToString
would allocate a string to return, so previously there was a lot of allocations and copying going on. With this PR, all those StringBuilder
/char[]
/string
allocations (and copying) are reduced to a only a single char[]
buffer allocation.
I agree it'd be nice if the headers were lazily extracted as requested by callers -- avoiding the buffer allocation and parsing, if unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@justinvp Thanks for explaining: I didn't get the time to closely look at the prior implementation.
I'll open 2 separate issues to track the problems since the first one is much simpler to solve.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/cc @davidsh @himadrisarkar
Opened #11978 and #11979.
…rseresponseheaders Reduce allocations when parsing response headers in WinHttpHandler Commit migrated from dotnet/corefx@35f7ef0
Recent changes due to PR dotnet/corefx#3199 caused the `HttpResponseMessage.ReasonPhrase` property to be a null string if the HTTP server response only contained a statuscode but no description on the status line. The existing .NET Framwork behavior is to have empty string as the property value in that case. Added a new test to validate this change. Fixes dotnet/corefx#6721. Commit migrated from dotnet/corefx@19ec320
Recent changes due to PR dotnet/corefx#3199 caused `HttpClient` on Windows to miss `Set-Cookie` response headers when one of the header values was large enough to cause `WinHttpQueryHeaders` to fail with `ERROR_INSUFFICIENT_BUFFER` and advance the index. This commit addresses the regression, by always setting the index back to the index we want to retrieve. Added a new test to validate this change. Fixes dotnet/corefx#6737. Commit migrated from dotnet/corefx@f7e9f00
There are currently many unnecessary allocations when creating the response message in
WinHttpHandler
.The native
WinHttpQueryHeaders
function is called a few times to retrieve response information, such as the version string, reason phrase, and raw headers.Each call to
WinHttpQueryHeaders
currently results in 3 allocations (9 total for the 3 calls):StringBuilder
buffer passed to the native function to be filled in.char[]
inside theStringBuilder
.string
allocated whenStringBuilder.ToString()
is called.On platforms that need to do manual decompression (lower than Win 8.1), there's another 2-3 allocations to get the Content-Encoding header value.
We can reduce the allocations by 1) modifying the p/invoke signature for
WinHttpQueryHeaders
to take achar[]
instead ofStringBuilder
and 2) allocate a singlechar[]
large enough to be reused across multiple calls toWinHttpQueryHeaders
.With those changes, 9 allocations are down to 2: an allocation for the raw headers
char[]
and another for the reason phrasestring
.There are many common reason phrases, such as "OK" for a 200 status code, or "Not Found" for a 404 status code. There's already internal code in
HttpStatusDescription
for looking up a known reason phrase string based on the status code, and we can make use of this existing code to avoid the reason phrase string allocation if the reason phrasechar[]
array segment is equal to a known reason phrase. 9 allocations down to 1, for common reason phrases.When parsing the raw headers, the current code is using
string.Split
to split the raw headers by lines separated by"\r\n"
. There are a bunch of unnecessary allocations associated with the use ofstring.split
: internallystring.Split
allocates two arrays to keep track of the indexes and separator sizes, there's the overall array returned fromSplit
, plus each string in the array representing each line. Each line then needs to be split further to get the header name and value for each line. The header value was also being trimmed withstring.Trim
, another potential allocation. Also, the first line of the raw headers is the status line, which is being allocated as astring
even though this line is currently being skipped.All of these intermediate allocations can be avoided by parsing the
char[]
directly, and only allocatingstring
s for the individual header names and values. And we can avoidstring
allocations for known header names and values, similar to the known reason phrase optimization.Notes:
HttpKnownHeaderNames.TryGetHeaderName
is looking up all known header name constants, which include both request and response headers, while we only really care about response headers. My opinion is that this is OK. I think it's easier to maintain the lookup of known headers if it is just looking up all known headers, instead of trying to segregate which ones are only response headers.HttpKnownHeaderNames.TryGetHeaderName
is currently doing an ordinal lookup. Header names are case insensitive, so I'd be open to changing this to do an case insensitive lookup. I kept it ordinal so the string remains the exact same as the response string from WinHTTP. However, it looks like WinHTTP normalizes the casing of headers it knows about (e.g. WinHTTP returns"Content-Length"
when the actual server response is"content-length"
), so maybe switching to case insensitive lookup isn't a big deal.Common
(e.g.HttpStatusDescription
). I can squash some of the commits together, if desired.HttpClient
that I'm looking into improving in a separate PR.