-
Notifications
You must be signed in to change notification settings - Fork 125
Crash: precondition failure in RequestBag+StateMachine.swift #576
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hmm, this appears to be happening due to overzealous policing of the As a temporary workaround you can try disabling HTTP/2 support to see if it's only occurring in the HTTP/2 path, but if it's occurring in the HTTP/1 path then there may be other issues. Do you have logging enabled? It would be useful to see the logs to see the specific trigger for these issues, as I think this can only happen if we have hit a different error in the request pipeline already. |
This appears to be a bit of a window condition. Out of curiosity, do you ever call |
@Lukasa Zach can correct me but I don't think we ever call .cancel |
Confirmed, Will reply to your earlier comment ASAP. |
Calling |
I set the log level to |
I think we'd need to go to Out of curiosity, has it been possible to try disabling HTTP/2? |
You can also try using the branch from #577 to see how that affects your issue. |
Sounds good, will change to I haven't tried disabling HTTP/2 yet. The
|
@Lukasa I think we're in good shape now. I deployed to production using #577 about 30 minutes ago, and have not seen a crash since. No logs contain Do you have any guesses about where this behavior might have come from? As far as we can tell, it was not correlated with any release of our application, and we haven't made changes to our use of the client or general configuration recently. Could unexpected behavior by an external service (that our application's HTTP client is sending requests to) cause this to occur? Thanks so much for your help with this! |
I suspect an external service, yes. The specific theory I have is that this is a timing window. The thing that gives me pause is that I don't think there are many things that can fail a However, the |
Oh, and to clarify, if the other service's timing has changed that can affect the size of this window. |
Thanks for explaining that. Two hours since the deploy and we continue to be crash-free. Sorry to say I wasn't able to reproduce the crash in a separate environment, so this will probably remain a mystery, but if we learn anything more I'll post it here. |
Motivation The RequestBag intermediates between two different threads. This means it can get requests that were reasonable when they were made but have been superseded with newer information since then. These generally have to be tolerated. Unfortunately if we received a request to resume the request body stream _after_ the need for that stream has been invalidated, we could hit a crash. That's unnecessary, and we should tolerate it better. Modifications Tolerated receiving requests to resume body streaming in the finished state. Result Fewer crashes Fixes #576
I wanted to provide a bit more info in case it's of interest to you. During the period when these crashes were occurring, we had a prolonged increase in CPU utilization, occasionally spiking to 100%. (That CPU increase seems to be directly related to the onboarding a new customer that required our servers to process a higher volume of data than they typically do.) Maybe there's a relationship between maxing out our CPU and the unexpected bevahior reported in this issue. Not sure what the mechanism would be, just a hunch. |
The high CPU usage likely correlates with increased queue depth and contention on the event loop threads, which widens any timing window that involves scheduling on those threads. |
We're on version 1.9.0 and we saw hundreds of these crashes in our server logs today. It looks like it started happening in our logs more than a week ago but suddenly escalated today.
We tried rolling back to 1.8.2 but that did not resolve the issue.
Any ideas on how we can avoid this or debug it further?
The text was updated successfully, but these errors were encountered: