-
Notifications
You must be signed in to change notification settings - Fork 10.3k
Unpredictable execution of "StopAsync" #22831
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@sgraziosi Before going any further (i.e., asking the product unit), timestamp the "is {starting | working | stopping}" hosted service messages ... and implement IHostApplicationLifetime and timestamp those messages. I recall seeing a similar situation, and the sequence of log entries was unreliable for determining the order in which the events were occurring. Confirm with timestamps the sequence first, and note that the framework's own logging might still not appear in the exact order of actual events. I'll mark this issue on hold for a bit while you try that and report back on the outcome. |
@guardrex, thanks for the super speedy answer!
(when two instances coexist) I have consistently seen that the lines for each instance appear every 5 seconds (as expected). I spent so much time looking at this output and trying to figure out what's wrong, so please trust me on this. Implementing IHostApplicationLifetime seems like a good idea and I'll give it a try. Meanwhile, I think the biggest risk we have is that whoever might take this issue on may be unable to replicate the behaviour, so I'd recommend (if I may, and only if I may) to pass this issue on to try a straightforward replication. It's very possible that it will fail, in which case we'll have a bigger problem. I suspect replication might fail because I initially could not "reach" the |
Sometimes, we send these types of issues directly to the product unit's repo. I think this is still a content issue for the topic at this point. One of MS's 🥇 top engineers 🥇 worked on this topic with me, and you can rest assured that it's either correct (or mostly correct 🙈😄). It might need to address this situation on the actual timing of events. The topic illustrates the use of a logger (not console writes), and that's what I referring to about my vague recollection on the sequence of indicated events in relation to actual events . In any case ... even if I'm not recalling that correctly, let's ping @Tratcher 🥇 for advice. We might need to address this in the topic in some way to address your confusion if the sequence of messages can't be trusted. If he thinks that this should be addressed as an engineering issue, he'll move it over to their repo for more discussion/work. |
Thanks again @guardrex - I'm going to try something with On the other hand, I can insist that the "ordering" isn't the issue here (not for me, might be a legitimate concern for you!). It's what gets reported that matters... |
OK, here we go... New code:
New activity:
I've skipped with
Instance 89572786 was expected to self destruct, instead 807768962 does, also executes "OnStopping()" and "OnStopped()". I would have expected 89572786 to execute these (and to self destruct). After that, it all goes downhill. OnStarted() happens when and where you'd expect, but OnStopping/ed never happen again... Made me notice that also the "Application is shutting down..." message stops appearing (which isn't done by code I wrote). I hope this helps! Will resume tomorrow with many thanks. |
The absence of "Application is shutting down..." implies the application isn't shutting down gracefully, it's being forcefully terminated. That would prevent any graceful StopAsync logic from running. @shirhatti @jkotalik are IIS recycles supposed to gracefully shut down or abort? (2.1 out-of-proc) |
@Tratcher here is what I think, which comes from a summary "outsider" understanding:
From 17:49:52:40 to 17:50:42:40 we have the app running normally, there is one instance of the app, and one corresponding instance of the timed service (89572786). Then a recycle happens. IIS does this by (a) creating a new instance of the App (generating a new timed service "called" 807768962 in this particular execution), which will receive all new requests. (b) asks the old instance to shut down gracefully and waits. (c) After a timeout, IIS kills the old instance, if it didn't self-destruct already. Why this happens is a mystery I'm not equipped to solve. Two topmost hypotheses I have (in no order): Thus, for me, priority would be to distinguish between H1 and H2... |
Transferring this to aspnetcore for investigation. |
Hello all (@Tratcher, @BrennanConroy),
|
Update from me:
Resolution that worked for me is: dotnet/sdk#2682 (comment) What changed: |
Thanks for contacting us. |
We've moved this issue to the Backlog milestone. This means that it is not going to be worked on for the coming release. We will reassess the backlog following the current release and consider this item at that time. To learn more about our issue management process and to have better expectation regarding different types of issues you can read our Triage Process. |
Hello,
I'm trying to achieve something complex, which forced me to look under the hood, only to discover that things don't work as expected (or advertised).
My final aim is beyond the scope of this issue, but will describe it to give you some context. We have a two-clients app, which uses CSLA business objects. Old version is phasing out, based on Silverlight. Sometimes, we need to run procedures that can last long (minutes, even 30 or more). In the old version, we do this via
HostingEnvironment.QueueBackgroundWorkItem(cancellationToken => MethodThatCouldTakeLong(cancellationToken));
.If cancellation is happening (IIS recycling the App Pool) this gives the method a chance to terminate gracefully in mid-flight, which is good!
The newer version of the App uses DotNet Core 2.1, MVC and an Angular client, in between, we use the same CSLA business object (we share the same code, with occasional changes when DotNet standard and Core differ, done via compiler directives). ``HostingEnvironment.QueueBackgroundWorkItem()
isn't available in DotNet Core, so I looked into
IHostedService` to achieve the same aim: let the long lasting method "learn" when the overall app is shutting down, so to cancel it gracefully (and hopefully, by using "compiler directives" as little as possible).In order to succeed, I need two things:
[1] A system that allows me to run some code only when the app is trying to shut down (the StopAsync method).
[2] A system to "reach" the business object so that it "learns" that [1] happened.
The
IHostedService
feature promises to give me [1], but I can't make it work. I actually implemented [2] in many ways, only to eventually discover that it wasn't working because [1] wasn't happening.Ignoring problem [2], here is what I see about [1], using a minimal implementation, based on the
TimedHostedService
example.This is marginally different from the example code: I initialise with an (int)ID, to follow the evolution of concurrent instances and I log to Console, so that I can see in realtime what happens via the Output window in VS ("ASP.NET Core Web Server" channel).
I register the TimedHostedService like this:
(Side note: it took me I don't know how long to make the compiler find the AddHostedService method, and I have no idea of how I've finally made it work.)
I have configured VS to start the app using my local IIS, and the relevant app pool to recycle every 7 requests, so that I can experiment with the TimedHostedService behaviour (by asking 7 times to login with the wrong password from the client, so by generating exactly 7 requests, whenever I want and only when I want).
What I would expect, from a standstill (my comment in square brackets):
What I get is very different and, if I read it correctly, it means that this system isn't working and I can't use it. Perhaps I'm doing something wrong, but I have no idea of where to start looking, as this implementation is the simplest I could find.
Actual output (in square brackets are my comments):
What we're seeing here appears to show the following:
1st recycle - StopAsync is triggered for the wrong (newer) instance 1316790599, which stops gracefully, but should not have stopped. Instance 921976706 remains alive, but should have been stopped.
2nd recycle - 921976706 is still alive. 1886756708 starts, and then once gain StopAsync is triggered for the wrong (newer) instance 1886756708. Instance 921976706 remains unscathed.
3rd recycle - 1221835851 is born, 921976706 disappears without a sound.
4th recycle - having lost 921976706, only 1221835851 remains; 1930822523 enters the scene. For quite a while (17*5s) 1221835851 and 1930822523 coexist, until 1221835851 dies another sudden (ungraceful) death.
In my various attempts, I've seen the above patterns mix and match, with no logic I could find in their order (I'm pretty sure I did also see the right thing happening, but I'm not 100% sure).
What seems to be regular is that after recycle 4, the same pattern seem to stabilise (coexistence and sudden death of the older instance).
My (informed by ignorance) explanation: this system isn't working and I can't use it for anything.
Otherwise: do you have a possible explanation of what I'm doing wrong and/or how I could try to find out what's wrong?
Thanks!
Document Details
⚠ Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.
The text was updated successfully, but these errors were encountered: