-
-
Notifications
You must be signed in to change notification settings - Fork 31.8k
gh-128041: Add a terminate_workers method to ProcessPoolExecutor #128043
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Provides a way to forcefully stop all the workers in the pool Typically this would be used as a last effort to stop all workers if unable to shutdown / join in the expected way
ae41bf7
to
61c9b14
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some additional comments. It would be great to have type checks test, e.g., when you pass an invalid signal value (namely, check that os.kill
would raise a TypeError / ValueError).
Misc/NEWS.d/next/Library/2024-12-17-18-53-21.gh-issue-128041.W96kAr.rst
Outdated
Show resolved
Hide resolved
Thanks @picnixz, I think I resolved all of your comments, please check again when you can. |
Hey @gpshead mind taking a look? Ref https://discuss.python.org/t/cancel-running-work-in-processpoolexecutor/58605/2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some other comments. I'll let @gpshead take over the review for a more in-depth knowledge (I'm not enough well-versed in concurrent.futures
since I mainly use multiprocessing
instead).
Co-authored-by: Bénédikt Tran <[email protected]>
Co-authored-by: Bénédikt Tran <[email protected]>
@csm10495 If you want the CI to be re-run, don't hesitate to ping me. |
@gpshead mind taking a look? |
Can I help to push the PR forward? This would be a very useful feature. |
I'm not confident enough with the subtlties of multiprocessing to merge the feature myself. @gpshead Can you have a look at this one please? |
@picnixz, I've done all the suggestions except swapping parameterize for individual methods. I genuinely think the current format is easier to understand for the bulk of folks looking at the code. For now terminate vs kill has more/less the same behavior for most users (the only difference being the signal on non win32). I understand splitting it out if the functionality plays out differently, but for now its all more/less the same, so I don't see the reason to split it out that much. All that being said, if you insist on it: I can make the changes. Just wanted to give one small plea for this way instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rationale behind my suggestion is twofold:
- It helps wrapping lines under 80 chars (PEP-8)
- It makes it cleaner for debugging in case of failures.
However, I think using parametrize
is fine now. I think what bothered me most was _terminate_or_kill
naming because. But now it looks better. I would still want you to respect 80 chars wrap.
Misc/NEWS.d/next/Library/2024-12-17-18-53-21.gh-issue-128041.W96kAr.rst
Outdated
Show resolved
Hide resolved
I'll merge this one tomorrow or Gregory can merge it sooner if he wants (I usually avoid merging PRs after 8 PM as I try to have a nice commit message which is harder to find when I'm tired) |
else: | ||
self.fail(f"Unknown operation: {function_name}") | ||
|
||
self.assertRaises(queue.Empty, q.get, timeout=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do suspect we may see this come up as occasionally flaky in buildbot or CI systems as it is depend on the timing of the sleeps and kills which really can't be guaranteed on a loaded system. if so, _put_sleep_put can have its sleep increased. lets see how it goes first.
|
The Android failure is #124666. Although it involves a queue, it doesn't involve |
We have some build bots failures: https://buildbot.python.org/#/builders/568/builds/8398 but I don't know how important they are. |
We have other workers issues: https://github.com/python/cpython/actions/runs/13631446353/job/38099943353#step:11:686 |
It seems like on some platforms, we're not getting negative signals as the exit code on terminate/kill. Instead it gets a 255 which is odd. Though on a retry sometimes it passes, so it must be a race condition of some sort. When I get a moment I can try to swap from using signals to verify to patching with a side effect of running the actual method. |
I think we should first revert the commit for now because it's causing macOS builds to fail (for instance https://github.com/python/cpython/actions/runs/13636802329/job/38117531328?pr=121119) =/ |
Is there a way to get more info about what about the ENV changed in the test? I'm not seeing how/why this could cause that but knowing which key changed may help tell more. |
Wait. No, the issue is indeed the environment that changed. |
Yeah all i see is:
|
Let's perhaps skip ignore this one which seems flaky. For instance it disappeared on my PR now. However, the buildbot failure is a true failure on some systems as it consistently fail on main |
I am still seeing |
…ethods to ProcessPoolExecutor (pythonGH-128043)" The test_concurrent_futures.test_process_pool test is failing in CI. This reverts commit f97e409.
…o ProcessPoolExecutor (pythonGH-128043) This adds two new methods to `multiprocessing`'s `ProcessPoolExecutor`: - **`terminate_workers()`**: forcefully terminates worker processes using `Process.terminate()` - **`kill_workers()`**: forcefully kills worker processes using `Process.kill()` These methods provide users with a direct way to stop worker processes without `shutdown()` or relying on implementation details, addressing situations where immediate termination is needed. Co-authored-by: Bénédikt Tran <[email protected]> Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com> Commit-message-mostly-authored-by: Claude Sonnet 3.7 (because why not -greg)
…ethods to ProcessPoolExecutor (pythonGH-128043)" (python#130838) The test_concurrent_futures.test_process_pool test is failing in CI. This reverts commit f97e409.
Provides a way to forcefully stop all the workers in the pool
Typically this would be used as a last effort to stop all workers if unable to shutdown / join in the expected way.
terminate_workers
toProcessPoolExecutor
#128041