Use a dedicated thread for timers in rather than a Timer #21490

davidfowl · 2020-05-05T06:33:33Z

This make it possible to still timeout various operations when there's thread pool starvation occurring.
One downside is that the "heartbeatslow" warning that usually can communicate starvation to users is now gone.

- This make it possible to still timeout various operations when there's thread pool starvation occuring. - One downside is that the heartbeatslow warning that usually can communicate starvation to users is now gone.

src/Servers/Kestrel/Core/src/Internal/Infrastructure/Heartbeat.cs

Co-authored-by: Günther Foidl <[email protected]>

Tratcher · 2020-05-05T16:37:43Z

Interesting theory. How can we tell if it's effective?

Tests need updating.

halter73 · 2020-05-05T23:13:02Z

This make it possible to still timeout various operations when there's thread pool starvation occurring.

One downside is that the "heartbeatslow" warning that usually can communicate starvation to users is now gone.

There's still risk of the thread being preempted and not getting rescheduled for long periods when there's threadpool starvation. I think if a heartbeat takes longer than the _interval (1s), we should still log the "heartbeatslow" warning.

Today there are two possible causes for the warning:

A lot of threadpool threads become available at once, and there's a backlog of multiple timer callbacks due to starvation that get dequeued simultaneously. This we can no longer catch without queuing work items.
There are so many active threadpool threads the timer callback gets preempted and takes over 1 second of real time before the callback gets rescheduled and complete. This we can still catch with a dedicated thread.

I was hoping to get some customer reports of the heartbeat slow log now that we started logging the heartbeat duration in 5.0 (#15273) to see which of these causes is more common.

halter73 · 2020-05-05T23:13:24Z

src/Servers/Kestrel/Core/src/Internal/Infrastructure/Heartbeat.cs

            }
        }

        public void Dispose()
        {
-            _timer?.Dispose();
+            _stopped = true;


Why not implement DisposeAsync and wait? Maybe throw after a timeout.

Because it’s not important. Any hung thread will be visible in the debugger/dump.

halter73 · 2020-05-05T23:14:24Z

src/Servers/Kestrel/Core/src/Internal/Infrastructure/Heartbeat.cs

+            _timerThread = new Thread(state => ((Heartbeat)state).TimerLoop())
+            {
+                Name = "Kestrel Timer",
+                IsBackground = true


In LibuvThread, we set IsBackground to false in debug so we can find issues where the thread isn't properly stopping.

Sure but the libuv thread does more than this. I wouldn’t compare the 2. We could set it to true in debug if you’re paranoid about the timer not shutting down.

How/when does this get initiated? Is it lazy and triggered by a first request? i.e. does it need executioncontext flow supression?

Don't want an HttpContext stuck hanging around on a Thread 😉

How/when does this get initiated? Is it lazy and triggered by a first request? i.e. does it need executioncontext flow supression?

Startup. Could suppress execution context but that isn't new behavior. The Timer before didn't suppress it 😄 , nothing new here.

src/Servers/Kestrel/Core/src/Internal/Infrastructure/Heartbeat.cs

halter73 · 2020-05-05T23:25:10Z

You're going to want to look at

aspnetcore/src/Servers/Kestrel/Core/test/HeartbeatTests.cs

Line 56 in 8cd404d

heartbeatHandler.Verify(h => h.OnHeartbeat(systemClock.UtcNow), Times.Once());

and the other tests in that class.

davidfowl · 2020-05-05T23:40:11Z

There are so many active threadpool threads the timer callback gets preempted and takes over 1 second of real time before the callback gets rescheduled and complete. This we can still catch with a dedicated thread.

Sure. What’s great about this is that this thread will continue to run during the ramp up to a high number of threadpool threads. Today the thread pool needs to dequeue the timer callback as threads are slowly being injected into the thread pool (fighting other threads).

halter73 · 2020-05-06T00:16:09Z

Sure. What’s great about this is that this thread will continue to run during the ramp up to a high number of threadpool threads. Today the thread pool needs to dequeue the timer callback as threads are slowly being injected into the thread pool (fighting other threads).

The thread could still get preempted. A developer could block too long in a custom IConnectionHeartbeatFeature callback. We could introduce a deadlock. There are a bunch of reasons we should still time OnHeartbeat() and log "heartbeatslow" if it takes over a second.

I agree that "heartbeatslow" warnings are likely more often caused because bunch of timer callbacks get dequeued in rapid succession as the threadpool works through a backlog of work items, but what's the harm in timing OnHeartbeat()? Are we sure that already-dequeued timer callback never gets preempted by the OS for over a second? What's wrong with logging something if and when this does happen?

Again, getting #15273 in the hands of customers would be telling.

davidfowl · 2020-05-06T00:18:49Z

The thread could still get preempted. A developer could block too long in a custom IConnectionHeartbeatFeature callback. We could introduce a deadlock. There are a bunch of reasons we should still time OnHeartbeat() and log "heartbeatslow" if it takes over a second.

Yes all 4 of those developers writing connection middleware 😬.

halter73 · 2020-05-06T00:21:03Z

You're right about the connection middleware. I'm sure there are less that use the feature. The warning wouldn't catch a deadlock either because we don't have another thread to observe it on.

We should still log if OnHeartbeat() takes over _interval though. If it never happens, no harm.

- Print heartbeat slow if duration > interval - Fix tests

JamesNK · 2020-05-07T03:12:47Z

Is there overhead in reserving a dedicated thread? e.g. is it reserving a bunch of memory for its stack that it will never use

davidfowl · 2020-05-07T03:14:13Z

Is there overhead in reserving a dedicated thread? e.g. is it reserving a bunch of memory for its stack that it will never use

There's definitely overhead but not enough to matter. This thread runs {number of connection} callbacks every second, so it's busy.

JamesNK · 2020-05-07T03:16:45Z

Yeah, whether the overhead matters or not is more my question. I recall threads grabbing a megabyte of memory for its stack. I was wondering whether that was still a thing, and if it is, whether this is worth it.

benaadams · 2020-05-07T03:19:09Z

I recall threads grabbing a megabyte of memory for its stack.

I think it's just reserved pretend memory these days?

davidfowl · 2020-05-07T18:04:23Z

@dougbu what gives?

dougbu · 2020-05-07T19:36:17Z

@davidfowl what gives with what?

davidfowl · 2020-05-07T19:37:00Z

There’s an OOM in msbuild blocking my merge

dougbu · 2020-05-07T19:55:46Z

Ah, that's a known problem w/ msbuild: dotnet/msbuild#3577. @wtgodbe is trying to get a fix through Arcade in dotnet/arcade#5421 and, once that's agreed on, will apply the change to eng/common files in #21478 to avoid the (very long) delays before Arcade can publish an updated version.

And, no, we don't know why the problem is suddenly much worse. @rainersigwald any comment on that side of the issue?

halter73 · 2020-05-07T19:56:06Z

I think dotnet/msbuild#3577 is tracking the issue (even though it's really old @wtgodbe is talking about opening a new issue) and dotnet/arcade#5421 and/or #21478 should mitigate it.

There have been a lot of these failures reported in the ASP.NET Build Teams channel.

davidfowl · 2020-05-07T19:58:44Z

OK well somebody with admin merge my stuff 😄

dougbu · 2020-05-07T20:02:20Z

Just retry the failing leg. The OOM errors are flaky, not failing all the time.

Use a dedicated thread for timers in rather than a Timer

d41ab9a

- This make it possible to still timeout various operations when there's thread pool starvation occuring. - One downside is that the heartbeatslow warning that usually can communicate starvation to users is now gone.

davidfowl requested review from analogrelay, halter73, jkotalik and Tratcher as code owners May 5, 2020 06:33

ghost added the area-servers label May 5, 2020

gfoidl reviewed May 5, 2020

View reviewed changes

src/Servers/Kestrel/Core/src/Internal/Infrastructure/Heartbeat.cs Outdated Show resolved Hide resolved

Apply suggestions from code review

f935292

Co-authored-by: Günther Foidl <[email protected]>

halter73 reviewed May 5, 2020

View reviewed changes

src/Servers/Kestrel/Core/src/Internal/Infrastructure/Heartbeat.cs Show resolved Hide resolved

PR feedback

f7148d2

- Print heartbeat slow if duration > interval - Fix tests

halter73 approved these changes May 6, 2020

View reviewed changes

Fix another test

b471fe8

davidfowl merged commit 6aa13dd into master May 8, 2020

davidfowl deleted the davidfowl/reliable-timers branch May 8, 2020 08:07

amcasey added area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions and removed area-runtime labels Jun 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use a dedicated thread for timers in rather than a Timer #21490

Use a dedicated thread for timers in rather than a Timer #21490

davidfowl commented May 5, 2020

Tratcher commented May 5, 2020

halter73 commented May 5, 2020 •

edited

Loading

halter73 May 5, 2020

davidfowl May 5, 2020 •

edited

Loading

halter73 May 5, 2020

davidfowl May 5, 2020

benaadams May 7, 2020

benaadams May 7, 2020

davidfowl May 7, 2020

halter73 commented May 5, 2020

davidfowl commented May 5, 2020

halter73 commented May 6, 2020

davidfowl commented May 6, 2020

halter73 commented May 6, 2020

JamesNK commented May 7, 2020

davidfowl commented May 7, 2020

JamesNK commented May 7, 2020

benaadams commented May 7, 2020

davidfowl commented May 7, 2020

dougbu commented May 7, 2020

davidfowl commented May 7, 2020

dougbu commented May 7, 2020

halter73 commented May 7, 2020

davidfowl commented May 7, 2020

dougbu commented May 7, 2020

Use a dedicated thread for timers in rather than a Timer #21490

Use a dedicated thread for timers in rather than a Timer #21490

Conversation

davidfowl commented May 5, 2020

Tratcher commented May 5, 2020

halter73 commented May 5, 2020 • edited Loading

halter73 May 5, 2020

Choose a reason for hiding this comment

davidfowl May 5, 2020 • edited Loading

Choose a reason for hiding this comment

halter73 May 5, 2020

Choose a reason for hiding this comment

davidfowl May 5, 2020

Choose a reason for hiding this comment

benaadams May 7, 2020

Choose a reason for hiding this comment

benaadams May 7, 2020

Choose a reason for hiding this comment

davidfowl May 7, 2020

Choose a reason for hiding this comment

halter73 commented May 5, 2020

davidfowl commented May 5, 2020

halter73 commented May 6, 2020

davidfowl commented May 6, 2020

halter73 commented May 6, 2020

JamesNK commented May 7, 2020

davidfowl commented May 7, 2020

JamesNK commented May 7, 2020

benaadams commented May 7, 2020

davidfowl commented May 7, 2020

dougbu commented May 7, 2020

davidfowl commented May 7, 2020

dougbu commented May 7, 2020

halter73 commented May 7, 2020

davidfowl commented May 7, 2020

dougbu commented May 7, 2020

halter73 commented May 5, 2020 •

edited

Loading

davidfowl May 5, 2020 •

edited

Loading