-
Notifications
You must be signed in to change notification settings - Fork 10.3k
Use a dedicated thread for timers in rather than a Timer #21490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
davidfowl
commented
May 5, 2020
- This make it possible to still timeout various operations when there's thread pool starvation occurring.
- One downside is that the "heartbeatslow" warning that usually can communicate starvation to users is now gone.
- This make it possible to still timeout various operations when there's thread pool starvation occuring. - One downside is that the heartbeatslow warning that usually can communicate starvation to users is now gone.
src/Servers/Kestrel/Core/src/Internal/Infrastructure/Heartbeat.cs
Outdated
Show resolved
Hide resolved
Co-authored-by: Günther Foidl <[email protected]>
Interesting theory. How can we tell if it's effective? Tests need updating. |
There's still risk of the thread being preempted and not getting rescheduled for long periods when there's threadpool starvation. I think if a heartbeat takes longer than the _interval (1s), we should still log the "heartbeatslow" warning. Today there are two possible causes for the warning:
I was hoping to get some customer reports of the heartbeat slow log now that we started logging the heartbeat duration in 5.0 (#15273) to see which of these causes is more common. |
} | ||
} | ||
|
||
public void Dispose() | ||
{ | ||
_timer?.Dispose(); | ||
_stopped = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not implement DisposeAsync and wait? Maybe throw after a timeout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it’s not important. Any hung thread will be visible in the debugger/dump.
_timerThread = new Thread(state => ((Heartbeat)state).TimerLoop()) | ||
{ | ||
Name = "Kestrel Timer", | ||
IsBackground = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In LibuvThread, we set IsBackground to false in debug so we can find issues where the thread isn't properly stopping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure but the libuv thread does more than this. I wouldn’t compare the 2. We could set it to true in debug if you’re paranoid about the timer not shutting down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How/when does this get initiated? Is it lazy and triggered by a first request? i.e. does it need executioncontext flow supression?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't want an HttpContext stuck hanging around on a Thread 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How/when does this get initiated? Is it lazy and triggered by a first request? i.e. does it need executioncontext flow supression?
Startup. Could suppress execution context but that isn't new behavior. The Timer before didn't suppress it 😄 , nothing new here.
You're going to want to look at
and the other tests in that class. |
Sure. What’s great about this is that this thread will continue to run during the ramp up to a high number of threadpool threads. Today the thread pool needs to dequeue the timer callback as threads are slowly being injected into the thread pool (fighting other threads). |
The thread could still get preempted. A developer could block too long in a custom IConnectionHeartbeatFeature callback. We could introduce a deadlock. There are a bunch of reasons we should still time OnHeartbeat() and log "heartbeatslow" if it takes over a second. I agree that "heartbeatslow" warnings are likely more often caused because bunch of timer callbacks get dequeued in rapid succession as the threadpool works through a backlog of work items, but what's the harm in timing OnHeartbeat()? Are we sure that already-dequeued timer callback never gets preempted by the OS for over a second? What's wrong with logging something if and when this does happen? Again, getting #15273 in the hands of customers would be telling. |
Yes all 4 of those developers writing connection middleware 😬. |
You're right about the connection middleware. I'm sure there are less that use the feature. The warning wouldn't catch a deadlock either because we don't have another thread to observe it on. We should still log if OnHeartbeat() takes over _interval though. If it never happens, no harm. |
- Print heartbeat slow if duration > interval - Fix tests
Is there overhead in reserving a dedicated thread? e.g. is it reserving a bunch of memory for its stack that it will never use |
There's definitely overhead but not enough to matter. This thread runs {number of connection} callbacks every second, so it's busy. |
Yeah, whether the overhead matters or not is more my question. I recall threads grabbing a megabyte of memory for its stack. I was wondering whether that was still a thing, and if it is, whether this is worth it. |
I think it's just reserved pretend memory these days? |
@dougbu what gives? |
@davidfowl what gives with what? |
There’s an OOM in msbuild blocking my merge |
Ah, that's a known problem w/ And, no, we don't know why the problem is suddenly much worse. @rainersigwald any comment on that side of the issue? |
I think dotnet/msbuild#3577 is tracking the issue (even though it's really old @wtgodbe is talking about opening a new issue) and dotnet/arcade#5421 and/or #21478 should mitigate it. There have been a lot of these failures reported in the ASP.NET Build Teams channel. |
OK well somebody with admin merge my stuff 😄 |
Just retry the failing leg. The OOM errors are flaky, not failing all the time. |