Skip to content

HTTPs and "Connection: close" - huge performance degradation on windows #18488

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
surr34 opened this issue Jan 21, 2020 · 30 comments
Closed

HTTPs and "Connection: close" - huge performance degradation on windows #18488

surr34 opened this issue Jan 21, 2020 · 30 comments
Assignees
Labels
affected-very-few This issue impacts very few customers area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions enhancement This issue represents an ask for new feature or an enhancement to an existing one feature-kestrel severity-major This label is used by an internal tool
Milestone

Comments

@surr34
Copy link

surr34 commented Jan 21, 2020

We have a Kestrel service that recently receives a lot of https requests containing a Connection: close header. It seems the header has a massive impact on the number of requests we can handle. Testing with a simple service locally, the RPS drops from over 30k to under 500.

For testing purpose we ran the same service on Linux and implemented it in NodeJS as well. While the C# version on Windows was fast for kept-alive connections, the performance degradation was huge. Using Connection: close, the C# Windows version was outperformed by both NodeJS and the C# implementation on a Linux system. Based on other Github issues I believed the Windows version to be way faster than the Linux one. However below the number (RPS) we measured

  keep-alive close
Kestrel (win) 54641 349
Node (win) 10422 2256
Kestrel (Linux) 51490 2339
Node (Linux) 37504 2095

For the Kestrel Windows version we also noticed a significant spike in the CPU consumption of the lsass process.

@surr34 surr34 changed the title HTTPs and Connection: close performance issue HTTPs and Connection: close huge performance degradation on windows Jan 21, 2020
@surr34 surr34 changed the title HTTPs and Connection: close huge performance degradation on windows HTTPs and "Connection: close" - huge performance degradation on windows Jan 21, 2020
@stephentoub stephentoub transferred this issue from dotnet/runtime Jan 21, 2020
@stephentoub
Copy link
Member

Since this is focused on Kestrel, I've moved it for now to dotnet/aspnetcore. If investigation demonstrates it's instead due to something lower in the stack, we can move it back. Thanks.

@Tratcher
Copy link
Member

Tratcher commented Jan 21, 2020

With or without TLS (HTTPS)? lsass implies TLS.

@surr34
Copy link
Author

surr34 commented Jan 21, 2020

@Tratcher Tests were run with TLS (HTTPS)

@Tratcher
Copy link
Member

@stephentoub @halter73 haven't we seen other reports of SslStream handshakes being extremely expensive?

@analogrelay
Copy link
Contributor

We can do a quick investigation with some timers to confirm, but I'm pretty sure it's the SslStream handshake issues we've seen for a while (@karelz you're already tracking SSL performance improvements, right?).

@karelz
Copy link
Member

karelz commented Jan 22, 2020

@anurse we track SSL perf, but more for read/write throughput. While I've heard about handshake perf several times, I didn't see any data or anything.
Do we have something like that to help us focus?

@analogrelay
Copy link
Contributor

Yep, we'll try to do a quick bit of work to isolate what we can. If I remember correctly, the biggest signal we have is our benchmarks show a significant performance gap between ConnectionClose and ConnectionClose with HTTPS.

Improving HTTPS perf in general is on our radar, so this is an area we'll be working on.

@samsosa
Copy link
Contributor

samsosa commented Jan 22, 2020

Maybe the pull request dotnet/runtime#1949 helps.

@wfurt
Copy link
Member

wfurt commented Jan 24, 2020

no, I don't think so. However, dotnet/performance#1146 may be an interesting background. My Linux and Windows machines are pretty similar. They produce similar results for steady encrypt/decrypt but a handshake is MUCH slower on Linux.

I'm wondering if there could be something wrong with your windows Kestrel server. You can checkout perf repro and you can try to run standard SSL benchmarks.

As far as lsass, that should be expected. On Windows, schannel does SSL handshake and key management in this one daemon. After that is done, decrypt/encrypt with session key is done in-process. Could you share your test code and methodology @samsosa? I would like to see if I can reproduce your results.

@surr34
Copy link
Author

surr34 commented Jan 29, 2020

Running the System.Net.Security benchmarks gave me these results

Method Mean Error StdDev Median Min Max Gen 0 Gen 1 Gen 2 Allocated
HandshakeAsync 1,184.44 us 28.938 us 30.963 us 1,174.51 us 1,143.45 us 1,253.07 us - - - 5960 B
WriteReadAsync 12.72 us 0.304 us 0.350 us 12.75 us 12.11 us 13.40 us - - - -
ReadWriteAsync 45.67 us 0.909 us 0.851 us 45.45 us 44.44 us 47.43 us - - - 344 B
ConcurrentReadWrite 19.92 us 0.533 us 0.613 us 19.79 us 18.79 us 21.22 us - - - 88 B
ConcurrentReadWriteLargeBuffer 27.46 us 1.436 us 1.654 us 27.14 us 25.56 us 30.48 us - - - 43 B

For the benchmark I used multiple Azure VMs and fortio running on Ubuntu 18.04
fortio load -qps -1 -c 200 -t 60s -H "Connection: close" --https-insecure https://<ip>:21115.
and fortio load -qps -1 -c 200 -t 60s -H "Connection: close" http://<ip>:11115.

The code is pretty simple and builds on top of the kestrel sample

Program.cs
namespace KestrelSample
{
    using Microsoft.AspNetCore.Hosting;
    using Microsoft.Extensions.Hosting;
    using Microsoft.Extensions.Logging;
    using System;
    using System.IO;
    using System.Security.Cryptography.X509Certificates;
    using System.Text.RegularExpressions;

    public class Program
    {
        public static void Main(string[] args)
        {
            CreateHostBuilder(args).Build().Run();
        }

        public static IHostBuilder CreateHostBuilder(string[] args) =>
            Host.CreateDefaultBuilder(args)
                .ConfigureLogging(config => {
                    config.ClearProviders();
                })
                .ConfigureWebHostDefaults(webBuilder =>
                {
                    webBuilder.UseUrls("http://0.0.0.0:11115", "https://0.0.0.0:21115");
                    webBuilder.UseStartup<Startup>();
                });
    }
}
Startup.cs
namespace KestrelSample
{
    using Microsoft.AspNetCore.Builder;
    using Microsoft.AspNetCore.Hosting;
    using Microsoft.AspNetCore.Hosting.Server.Features;
    using Microsoft.AspNetCore.Http;
    using Microsoft.AspNetCore.Http.Extensions;
    using Microsoft.Extensions.Hosting;
    using Newtonsoft.Json;
    using System;
    using System.IO;
    using System.Text;
    using System.Threading.Tasks;

    internal class Response
    {
        [JsonProperty("method")]
        public string Method { get; set; }

        [JsonProperty("schema")]
        public string Schema { get; set; }

        [JsonProperty("protocol")]
        public string Protocol { get; set; }

        [JsonProperty("host")]
        public string Host { get; set; }

        [JsonProperty("headers")]
        public IHeaderDictionary Headers { get; set; }

        [JsonProperty("path")]
        public string Path { get; set; }

        [JsonProperty("query")]
        public IQueryCollection Query { get; set; }

        [JsonProperty("queryString")]
        public QueryString QueryString { get; set; }

        [JsonProperty("body", NullValueHandling = NullValueHandling.Ignore)]
        public string Body { get; set; }
    }

    public class Startup
    {
        public void Configure(IApplicationBuilder app)
        {
            var serverAddressesFeature =
                app.ServerFeatures.Get<IServerAddressesFeature>();

            app.UseStaticFiles();

            app.Use(async (context, next) =>
            {
                if (context.Request.Query.TryGetValue("base", out var _))
                {
                    await next.Invoke().ConfigureAwait(false);
                }
                else
                {
                    try
                    {
                        var response = new Response
                        {
                            Method = context.Request.Method,
                            Schema = context.Request.HttpContext.Request.Scheme,
                            Protocol = context.Request.HttpContext.Request.Protocol,
                            Host = context.Request.HttpContext.Request.Host.ToString(),
                            Path = context.Request.Path.ToString(),
                            Headers = context.Request.Headers,
                            Query = context.Request.Query,
                            QueryString = context.Request.QueryString,
                        };

                        context.Response.ContentType = "application/json";

                        if (context.Request.Body != null)
                        {
                            using (var reader = new StreamReader(context.Request.Body))
                            {
                                response.Body = await reader.ReadToEndAsync().ConfigureAwait(false);
                            }
                        }

                        await context.Response.WriteAsync(JsonConvert.SerializeObject(response));
                    }
                    catch (Exception exception)
                    {
                        var result = new
                        {
                            error = exception.Message
                        };

                        context.Response.ContentType = "application/json";
                        context.Response.StatusCode = 500;
                        await context.Response.WriteAsync(JsonConvert.SerializeObject(result));
                    }
                }
            });

            app.Run(async (context) =>
            {
                context.Response.ContentType = "text/html";
                await context.Response
                    .WriteAsync("<!DOCTYPE html><html lang=\"en\"><head>" +
                        "<title></title></head><body><p>Hosted by Kestrel</p>");

                if (serverAddressesFeature != null)
                {
                    await context.Response
                        .WriteAsync("<p>Listening on the following addresses: " +
                            string.Join(", ", serverAddressesFeature.Addresses) +
                            "</p>");
                }

                await context.Response.WriteAsync("<p>Request URL: " +
                    $"{context.Request.GetDisplayUrl()}</p>");
                await context.Response.WriteAsync("<p>Request URL: " +
                    $"{context.Request.Host}</p>");
            });
        }
    }
}

I run my tests again and got similar results

Kestrel (Linux)   https http 
fortio keep-alive 25927.5 49956
  close 897.1333 8776.866667
Kestrel (Windows)   https http
fortio keep-alive 23242.2 45425
  close 297.7333333 4623.733333

On Windows both fortio and lsass had a huge cpu consumption, lsass peaked at around 40%

@jkotalik
Copy link
Contributor

@surr34 do you have the numbers between http vs https?

@surr34
Copy link
Author

surr34 commented Feb 3, 2020

updated the table above to include http numbers

@wfurt
Copy link
Member

wfurt commented Feb 4, 2020

can you please provide the test code for the Node as well @surr34? And do you use the default dev-certificate from Kestrel? I'm yet to execute your example on my reference machines. But since your results are somewhat strange, did you ever try to verify the numbers on a different Windows system to eliminate local anomaly? (Node does not use schannel/lsass/OS_ssl AFAIK so any misconfiguration would not be applicable)
I did more tests under dotnet/performance#1146 and the outcome is somewhat interesting. I would like to compare what is going on the wire before speculating more.

@surr34
Copy link
Author

surr34 commented Feb 4, 2020

I created a repository that contains all the code I used, both Kestrel and NodeJs https://github.com/surr34/https-bench .

Initially I measured the numbers locally and I was surprised by the result, hence I setup a few Azure VMs to run the measurement again (see numbers above).
For NodeJS I used a selfsigned 2048 bit RSA certificate, for Kestrel I used the default dev-certificate. In my tests NodeJs used OpenSSL did not use lsass/...

@wfurt
Copy link
Member

wfurt commented Feb 5, 2020

Thanks @sur34. I think I have repro and I will investigate. It seems like I also have 2K self-signed RSA.

@BrennanConroy
Copy link
Member

Is there a Runtime issue tracking this work?

I believe there have been improvements in this area, are there newer perf numbers?

@wfurt
Copy link
Member

wfurt commented May 11, 2020

I don't have any recent numbers but from what I saw bug part of the difference was that NodeJS used TLS1.3 and Kestrel did not. There were TSL1.3 fixes in SslStream but that still depends on OS support and configuration while NodeJS does not.

@BrennanConroy
Copy link
Member

Closing as the work is on the runtime side.

@karelz
Copy link
Member

karelz commented Jun 1, 2020

What is the runtime side work and how is it tracked?

@halter73
Copy link
Member

halter73 commented Jun 1, 2020

I don't think there's a specific runtime issue tracking this, but this scenario is captured in the "https" variation of our "ConnectionClose" benchmark which we record in PowerBI. You have to select the right checkboxes yourself. PowerBI doesn't capture which checkboxes are selected when you share a link.

The only related runtime issue that I see is dotnet/runtime#27916 which deals with TLS session resumption on linux. It was closed in part because FTPWebRequest wasn't seen as a compelling enough reason to put the significant amount of work required to get TLS session resumption working with OpenSSL (assuming it's not "impossible"). Maybe Kestrel's usage of SslStream for HTTPS is a compelling enough reason to reopen this.

I don't have any recent numbers but from what I saw bug part of the difference was that NodeJS used TLS1.3 and Kestrel did not. There were TSL1.3 fixes in SslStream but that still depends on OS support and configuration while NodeJS does not.

Kestrel currently defaults to TLS1.1 or TLS1.2 regardless of platform support for TLS1.3, but that issue is being tracked by #14997 and should be fixed in .NET 5 preview6. Otherwise, Kestrel's SslStream usage is pretty standard.

@karelz
Copy link
Member

karelz commented Jun 1, 2020

Maybe a dumb question, but how is TLS session resumption on Linux related here? I thought we were digging into Windows perf degradation.
If there is any work left (incl. perf investigations), I'd like to understand the scenario, impact and make sure it is tracked somewhere, otherwise it has high risk of being forgotten.

@Tratcher
Copy link
Member

Tratcher commented Jun 1, 2020

Re-opening until we identify the specific runtime work items.

@wfurt
Copy link
Member

wfurt commented Jun 1, 2020

We should re-test once dotnet/runtime#1720 lands (again)
The difference in TLS protocol was obvious but there can be more to it.
I added some more tests to perf repo so we should see runtime independently.

@halter73
Copy link
Member

halter73 commented Jun 2, 2020

The Linux TLS session resumption is only tangentially related. Given that the performance of this scenario is already better on Linux than on Windows, I can understand it still not being a big priority.

The reason for closing this was because it comes up in every Server triage, but any work to improve this scenario will need to be done in the runtime. We are already tracking this scenario in our benchmarks, and it seems unlikely that Kestrel will need to react to any runtime changes that might improve this scenario.

@karelz should we transfer the issue to the runtime so you have something tracking it?

@karelz
Copy link
Member

karelz commented Jun 2, 2020

@halter73 I would like to see something filed -- I just don't understand what we are trying to track with that work.
Is that a benchmark that needs investigation?
Is there a specific work needed to be done?

Maybe offline chat could help?

@halter73
Copy link
Member

halter73 commented Jun 2, 2020

@karelz and I talked offline and he asked if our benchmarks confirm the slower HTTPS "Connection: close" performance on Windows. It turns out, they don't:

Linux (~7.3K RPS)

Capture-Linux

Windows (13.7K RPS)

Capture-Windows

Our "Connection: close" benchmarks are quite a bit simpler than https://github.com/surr34/https-bench since our benchmark sends empty request bodies and the server app outputs plaintext rather than serialized JSON. This explains the much higher numbers.

I went ahead and tried https://github.com/surr34/https-bench on Azure F4 Windows and Linux VMs and used fortio to drive load to see if anything about this setup causes different results. I still see better RPS (or should I say QPS) numbers with a Windows server:

Ubuntu 16.04 (~720 QPS)

$ fortio load -qps -1 -c 200 -t 60s -H "Connection: close" --https-insecure https://10.1.2.6:21115
Fortio 1.3.1 running at -1 queries per second, 1->1 procs, for 1m0s: https://10.1.2.6:21115
01:23:53 I httprunner.go:82> Starting http test for https://<linux ip>:21115 with 200 threads at -1.0 qps
01:23:53 W http_client.go:136> https requested, switching to standard go client
Starting at max qps with 200 thread(s) [gomax 1] for 1m0s
...
Sockets used: 0 (for perfect keepalive, would be 200)
Code 200 : 43171 (100.0 %)
Response Header Sizes : count 43171 avg 0 +/- 0 min 0 max 0 sum 0
Response Body/Total Sizes : count 43171 avg 257 +/- 0 min 257 max 257 sum 11094947
All done 43171 calls (plus 200 warmup) 278.440 ms avg, 717.9 qps

Windows Server 2016 (~1050 RPS)

$ fortio load -qps -1 -c 200 -t 60s -H "Connection: close" --https-insecure https://10.1.2.7:21115
Fortio 1.3.1 running at -1 queries per second, 1->1 procs, for 1m0s: https://10.1.2.7:21115
01:32:36 I httprunner.go:82> Starting http test for https://<windows ip>:21115 with 200 threads at -1.0 qps
01:32:36 W http_client.go:136> https requested, switching to standard go client
Starting at max qps with 200 thread(s) [gomax 1] for 1m0s
...
Sockets used: 0 (for perfect keepalive, would be 200)
Code 200 : 63007 (100.0 %)
Response Header Sizes : count 63007 avg 0 +/- 0 min 0 max 0 sum 0
Response Body/Total Sizes : count 63007 avg 257 +/- 0 min 257 max 257 sum 16192799
All done 63007 calls (plus 200 warmup) 190.602 ms avg, 1049.0 qps

This is using v3.0.103 of the .NET Core SDK. @surr34 Do you have any idea why I'm not able to replicate your results?

@wfurt
Copy link
Member

wfurt commented Jun 2, 2020

What version of OpenSSL do you have @halter73? 16.04 is pretty old and when I run the benchmarks on Linux & Node on Windows it used TLS 1.3. I probed @sebastienros while back and it seems like we did not have any good setup to benchmark it.

@halter73
Copy link
Member

halter73 commented Jun 2, 2020

The server is running OpenSSL 1.0.2g, so no TLS 1.3 support.

The thing is, Kestrel uses SslProtocols.Tls11 | SslProtocols.Tls12 by default on both Windows and Linux, so that shouldn't be the difference between the two. I opened #22437 to change Kestrel to use the platform's default SslProtocols in .NET 5, but that's not in yet.

I should probably get around to upgrading my VMs anyway. I'll try to do that tomorrow and get updated results. It's still possible that a difference in OpenSSL versions is responsible for the benchmark differences.

@wfurt
Copy link
Member

wfurt commented Jun 2, 2020

That would be great @halter73. I can resurrect my old setup if needed. I had two old desktop machines and I saw significant drop. Maybe not as big as the number above but still quite visible. And Node significantly faster on the same machine.

@BrennanConroy BrennanConroy added affected-very-few This issue impacts very few customers enhancement This issue represents an ask for new feature or an enhancement to an existing one severity-major This label is used by an internal tool labels Oct 26, 2020 — with ASP.NET Core Issue Ranking
@halter73
Copy link
Member

halter73 commented Feb 3, 2021

The VMs I mentioned upgrading have since been deleted. I can try to recreate the setup, but #22437 has been merged and our normal benchmark infrastructure should catch regressions caused by default protocol changes or dependency updates.

@halter73 halter73 closed this as completed Feb 3, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Mar 5, 2021
@amcasey amcasey added area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions and removed area-runtime labels Jun 2, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
affected-very-few This issue impacts very few customers area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions enhancement This issue represents an ask for new feature or an enhancement to an existing one feature-kestrel severity-major This label is used by an internal tool
Projects
None yet
Development

No branches or pull requests