-
Notifications
You must be signed in to change notification settings - Fork 22
Fix issue with high-contention mutex causing hung workflows #309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Sushisource
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good find!
| prev = Fiber.current_scheduler | ||
| illegal_call_tracing_disabled { Fiber.set_scheduler(nil) } | ||
| begin | ||
| # Imply illegal call tracing disabled | ||
| illegal_call_tracing_disabled do | ||
| Fiber.set_scheduler(nil) | ||
| yield | ||
| ensure | ||
| illegal_call_tracing_disabled { Fiber.set_scheduler(prev) } | ||
| Fiber.set_scheduler(prev) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did any semantics actually change here or is this just different syntax?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this changes durable_scheduler_disabled to now perform the user block (i.e. the yield) inside of illegal_call_tracing_disabled instead of before where it was only the fiber scheduler get/set that was in there. It was decided to make durable_scheduler_disabled imply illegal_call_tracing_disabled because if you use the former, you almost always want the latter.
|
Had to update our JSON converter to skip scheduler when being used, see latest bullet added to description. Feel free to re-review, will merge early next week. There is going to be a soon followup with #310 anyways. |
Makes sense to me. |
What was changed
The main issue is there was a case where, in high-contention situations, a logger's mutex will block for a short little bit. Since we make mutexes durable, we completed the workflow task while the mutex was waiting and never resume the workflow to wake it up. We've made a couple of changes to help here.
Code to replicate
Changes:
durable_scheduler_disabledimplyillegal_call_tracing_disabledand update usages accordinglyputsor non-workflow logger use at high-contention can cause hung workflowsFile.expand_pathis now invoked at runtime during deprecation warning. So we decided that JSON conversion and any future stdlib changes inside it can skip the illegal call checks