Skip to content

Added some kestrel event counters #21649

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
May 12, 2020
Merged

Conversation

davidfowl
Copy link
Member

@davidfowl davidfowl commented May 9, 2020

Connection counters

  • Connection queue length - This is the amount of connections accepted and queued in the thread pool.
  • Connection count - The number of connections
  • Total connections - The total number of connections.
  • Connection Rate - Connections per second

TLS counters

  • Total TLS handshakes - The total number of TLS handshakes
  • Current TLS handshakes - The number of TLS handshakes being performed currently
  • Failed TLS handshakes - The number of failed TLS handshakes
  • TLS handshake Rate - TLS handshake per second

Request counters

  • Request Queue Length - The number of HTTP/2 and HTTP/3 connections queued in the thread pool.

  • Current Upgraded Requests - The number of HTTP/1 requests that are currently upgraded (basiacally the number of concurrent websocket connections)

  • Also added event source events to Kestrel (which I'm beginning to get very conflicted about 😄 )

Contributes to #4769

Notes: The Request Queue Length only works for HTTP/2 and HTTP/3. The Connection Queue doubles as the request queue for HTTP/1.

cc @shirhatti

- Connection queue length - This is the amount of connections accepted and queued in the thread pool.
- Connection count - The number of connections
- Total connections - The total number of connections ever connected.
- Connection Rate - Connections per second

Contributes to #4769
@ghost ghost added the area-servers label May 9, 2020
davidfowl added 3 commits May 8, 2020 23:22
- Current TLS handshakes
- Total TLS handshakes
- Failed TLS handshakes
- TLS handshake per second
davidfowl added 5 commits May 9, 2020 09:30
- Add TLS version to handshake events
- Add HTTP version to request queue events
- Renamed HTTP/2 request queue length to http request queue
- Event sources don't like null values
- Make the level information for most events
- Add more data to request events
@@ -1002,6 +1002,7 @@ private void StartStream()
throw;
}

KestrelEventSource.Log.RequestQueued(_currentHeadersStream, AspNetCore.Http.HttpProtocol.Http2);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems pretty artificial. This is just an async offload, not a meaningful queue. Requests can become async and get queued to the threadpool many times throughout their lifetime, why is this one special? Do we see requests consistently piling up here, more than other async calls?

Or maybe it's just a naming thing? Is what you're really trying to say here that we've received a complete request and are getting ready to start processing? (Not sure why you'd need Dequeued in that case though).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it's just another work item, I think this could be useful to quantify just how backlogged the thread pool is. It's one thing to say there's a thousand pending work items, but another to say there are thousand pending requests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's literally called QueueUserWorkitem 😄 , it's absolutely a queue. We're just counting how may items are ours vs global in the pool like @halter73 says.

Copy link
Member

@Tratcher Tratcher May 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same request will end up back in the same threadpool queue as soon as some middleware goes async. What's special about this specific instance? I don't see this producing any actionable data (unless if the queue is completely stalled?).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you produced any graphs with this yet? I'm really curious what they'd show.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's useful to see this number be non-zero.

Under sustained load, wouldn't this always be non-zero?

Under what conditions do you expect this number to diverge from it's usual value?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under sustained load, wouldn't this always be non-zero?

It'll be non-zero if there's blocking or a huge amount of load.

Under what conditions do you expect this number to diverge from it's usual value?

Starvation/blocking and it'll show up under a thing called connections/requests which is why I like it more than just having a single "threadpool queue" counter (though that is useful as well).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a new request on an existing connection, I'm assuming this is the first threadpool dispatch. However. in a threadpool starved scenario, this won't be the first threadpool dispatch for a request on a new connection right?

Am I correct in understanding that only the first scenario (i.e., requests on existing connections) will cause this counter to diverge from zero?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For HTTP/1 the request queue is the connection queue for HTTP/2 and 3 we queue per request

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This got me thinking maybe we should add a PollingCounter for concurrent idle/active connections (e.g. connections (not) in the keep-alive state). It wouldn't be equivalent to "current-connections" - "current-requests" because of HTTP/2 and 3. It might even be different for HTTP/1.x if we consider a connection "active" as soon as we get any part of the request line and/or headers.

@@ -169,6 +169,7 @@ public async Task OnConnectionAsync(ConnectionContext context)
{
selector = (sender, name) =>
{
feature.HostName = name;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should write log the hostname when a single cert is used and not an SNI callback meaning _serverCertificateSelector is null? We could always pass in a ServerCertificateSelectionCallback and just have it return the _serverCertificate when no custom callback is configured.

Also, don't you love it when you thought you left a comment days ago, but you accidentally started a review you didn't finsih?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should write log the hostname when a single cert is used and not an SNI callback meaning _serverCertificateSelector is null? We could always pass in a ServerCertificateSelectionCallback and just have it return the _serverCertificate when no custom callback is configured.

Where do I get that information? Is it on SSlStream?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do I get that information? Is it on SSlStream?

I don't know. I don't think so. That's why I suggested always using a ServerCertificateSelectionCallback, so the hostname could be captured inside the callback.

@davidfowl davidfowl merged commit 5a0c097 into master May 12, 2020
@davidfowl davidfowl deleted the davidfowl/add-kestrel-counters branch May 12, 2020 18:04
@shirhatti
Copy link
Contributor

@sebastienros We just added new EventCounters. I've filed an issue for having these enabled in our benchmarks aspnet/Benchmarks#1513

@davidsh
Copy link

davidsh commented Jun 2, 2020

Added new issue dotnet/diagnostics#1188 to make sure 'dotnet-counters' tool is updated for these new counters.

@amcasey amcasey added area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions and removed area-runtime labels Jun 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants