-
-
Notifications
You must be signed in to change notification settings - Fork 31.9k
gh-109593: Fix reentrancy issue in multiprocessing resource_tracker #109629
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pitrou
commented
Sep 20, 2023
•
edited by bedevere-app
bot
Loading
edited by bedevere-app
bot
- Issue: test_multiprocessing_forkserver.test_processes: ResourceTracker.ensure_running() calls itself and hangs indirectly via the GC #109593
@vstinner This solution should be robust while avoiding the performance issue of calling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please open a new issue and write a separated PR to add the _recursion_count()?
Should we? I don't want to enter a discussion of whether this private API is generally useful. It is purely meant to allow solving this issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I'm fine with the overall approach. But please add a comment in _stop() and ensure_running() to explain if self._lock._recursion_count() > 1: return
with a reference to gh-109593
.
7c01743
to
8471da5
Compare
warnings.warn( | ||
f"ResourceTracker called reentrantly for resource cleanup, " | ||
f"which is unsupported. " | ||
f"The {rtype} object {name!r} might leak.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vstinner This might raise a warning from time to time during the test suite, will it fail the buildbots?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hum. If you run tests with -Werror, it will raises an UserWarning exception, and can make the tests fail, no?
@vstinner I did some minor changes (mostly, I added a warning). Do you want to take another look? |
The default timeout test of test_threading is failing for many years: issue gh-109401. I launched a new job. |
@vstinner So are you ok with this PR after all? I can't think of another backportable solution. A cleaner solution may be found for the development version, but it will be too risky to backport. |
@@ -81,6 +97,9 @@ def ensure_running(self): | |||
This can be run from any process. Usually a child process will use | |||
the resource created by its parent.''' | |||
with self._lock: | |||
if self._lock._recursion_count() > 1: | |||
# The code below is certainly not reentrant-safe, so bail out | |||
return self._reentrant_call_error() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer:
return self._reentrant_call_error() | |
raise self._reentrant_call_error() |
or:
return self._reentrant_call_error() | |
self._reentrant_call_error() |
Since the return
here is a lie: the function never returns.
I'm not ok. Not only, the reproducer script fails when run with I would prefer to have a queue of pending messages, and send them once we are back to a known state: running and working resource tracker with a pipe connected to this process. I started a draft patch, but my head is blocked by the concept of reentrant call. I'm now used to multithreading where adding locks make the code correct by making sure that the code is executed in a specific order. But here a Your RLock change sounds more correct than my local change. |
For me, this PR changes too many things are once, I don't feel able to review it. As I wrote, I would prefer to have the RLock change done in a separated PR. So this change would smaller and it would be easier for me to think about it (and consider other implementations). |
The warning seems useful. Complaining that it fails when Would you prefer the problem to be silenced?
Sure, everyone would like something more reliable. Where is the patch?
The RLock change is trivial, so I'm not sure this would solve anything. But, mostly, the RLock change is only useful for this PR. That said, I can submit a separate PR for RLock if you're really keen on it. |
cc @gpshead |
Okay, I think I'm gonna merge this now and submit the backports. |
Thanks @pitrou for the PR 🌮🎉.. I'm working now to backport this PR to: 3.11. |
Thanks @pitrou for the PR 🌮🎉.. I'm working now to backport this PR to: 3.12. |
…cker (pythonGH-109629) --------- (cherry picked from commit 0eb9883) Co-authored-by: Antoine Pitrou <[email protected]> Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
…cker (pythonGH-109629) --------- (cherry picked from commit 0eb9883) Co-authored-by: Antoine Pitrou <[email protected]> Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
GH-109897 is a backport of this pull request to the 3.11 branch. |
GH-109898 is a backport of this pull request to the 3.12 branch. |
|
|
|
ResourceWarnings is a call for action: you should close a resource explicitly. But here, as an user, what can you do about this messy reentrant function call issue? Well, I suppose that a log / error is better than to hang. This change may or may not fix issue gh-109593: the test may continue to fail, but in different ways. |
|
…cker (python#109629) --------- Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
…cker (python#109629) --------- Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>