Skip to content

gh-128041: Add a terminate_workers method to ProcessPoolExecutor #128043

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 36 commits into from
Mar 3, 2025

Conversation

csm10495
Copy link
Contributor

@csm10495 csm10495 commented Dec 17, 2024

Provides a way to forcefully stop all the workers in the pool

Typically this would be used as a last effort to stop all workers if unable to shutdown / join in the expected way.

Provides a way to forcefully stop all the workers in the pool

Typically this would be used as a last effort to stop all workers if unable to shutdown / join in the expected way
@ghost
Copy link

ghost commented Dec 17, 2024

All commit authors signed the Contributor License Agreement.
CLA signed

Copy link
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some additional comments. It would be great to have type checks test, e.g., when you pass an invalid signal value (namely, check that os.kill would raise a TypeError / ValueError).

@csm10495
Copy link
Contributor Author

Thanks @picnixz, I think I resolved all of your comments, please check again when you can.

@csm10495
Copy link
Contributor Author

@picnixz picnixz requested a review from gpshead December 17, 2024 23:43
Copy link
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some other comments. I'll let @gpshead take over the review for a more in-depth knowledge (I'm not enough well-versed in concurrent.futures since I mainly use multiprocessing instead).

@picnixz
Copy link
Member

picnixz commented Dec 21, 2024

@csm10495 If you want the CI to be re-run, don't hesitate to ping me.

@csm10495
Copy link
Contributor Author

@gpshead mind taking a look?

@btel
Copy link

btel commented Feb 11, 2025

Can I help to push the PR forward? This would be a very useful feature.

@picnixz
Copy link
Member

picnixz commented Feb 11, 2025

I'm not confident enough with the subtlties of multiprocessing to merge the feature myself. @gpshead Can you have a look at this one please?

@csm10495
Copy link
Contributor Author

csm10495 commented Mar 2, 2025

@picnixz, I've done all the suggestions except swapping parameterize for individual methods. I genuinely think the current format is easier to understand for the bulk of folks looking at the code. For now terminate vs kill has more/less the same behavior for most users (the only difference being the signal on non win32).

I understand splitting it out if the functionality plays out differently, but for now its all more/less the same, so I don't see the reason to split it out that much.

All that being said, if you insist on it: I can make the changes. Just wanted to give one small plea for this way instead.

Copy link
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rationale behind my suggestion is twofold:

  • It helps wrapping lines under 80 chars (PEP-8)
  • It makes it cleaner for debugging in case of failures.

However, I think using parametrize is fine now. I think what bothered me most was _terminate_or_kill naming because. But now it looks better. I would still want you to respect 80 chars wrap.

@picnixz picnixz self-assigned this Mar 2, 2025
@picnixz
Copy link
Member

picnixz commented Mar 2, 2025

I'll merge this one tomorrow or Gregory can merge it sooner if he wants (I usually avoid merging PRs after 8 PM as I try to have a nice commit message which is harder to find when I'm tired)

else:
self.fail(f"Unknown operation: {function_name}")

self.assertRaises(queue.Empty, q.get, timeout=1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do suspect we may see this come up as occasionally flaky in buildbot or CI systems as it is depend on the timing of the sleeps and kills which really can't be guaranteed on a loaded system. if so, _put_sleep_put can have its sleep increased. lets see how it goes first.

@gpshead gpshead merged commit f97e409 into python:main Mar 3, 2025
39 checks passed
@bedevere-bot
Copy link

⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️

Hi! The buildbot aarch64 Android 3.x (tier-3) has failed when building commit f97e409.

What do you need to do:

  1. Don't panic.
  2. Check the buildbot page in the devguide if you don't know what the buildbots are or how they work.
  3. Go to the page of the buildbot that failed (https://buildbot.python.org/#/builders/1594/builds/1441) and take a look at the build logs.
  4. Check if the failure is related to this commit (f97e409) or if it is a false positive.
  5. If the failure is related to this commit, please, reflect that on the issue and make a new Pull Request with a fix.

You can take a look at the buildbot page here:

https://buildbot.python.org/#/builders/1594/builds/1441

Failed tests:

  • test_android

Failed subtests:

  • test_bytes - test.test_android.TestAndroidOutput.test_bytes

Summary of the results of the build (if available):

==

Click to see traceback logs
Traceback (most recent call last):
  File "/data/user/0/org.python.testbed/files/python/lib/python3.14/test/test_android.py", line 53, in setUp
    self.assert_log("I", tag, message, skip=True, timeout=5)
    ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/user/0/org.python.testbed/files/python/lib/python3.14/test/test_android.py", line 65, in assert_log
    self.fail(f"line not found: {expected!r}")
    ~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: line not found: 'test.test_android.TestAndroidOutput.test_bytes 1740967680.0124214'


Traceback (most recent call last):
  File "/data/user/0/org.python.testbed/files/python/lib/python3.14/test/test_android.py", line 63, in assert_log
    line = self.logcat_queue.get(timeout=(deadline - time()))
  File "/data/user/0/org.python.testbed/files/python/lib/python3.14/queue.py", line 212, in get
    raise Empty
_queue.Empty


Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/base_events.py", line 687, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/Users/android/buildarea/3.x.mhsmith-android-aarch64/build/Android/android.py", line 531, in run_testbed
    async with asyncio.TaskGroup() as tg:
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/taskgroups.py", line 134, in __aexit__
    raise propagate_cancellation_error
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/taskgroups.py", line 110, in __aexit__
    await self._on_completed_fut
asyncio.exceptions.CancelledError

@mhsmith
Copy link
Member

mhsmith commented Mar 3, 2025

The Android failure is #124666. Although it involves a queue, it doesn't involve concurrent.futures, so I don't think it's related to this PR.

@picnixz
Copy link
Member

picnixz commented Mar 3, 2025

We have some build bots failures: https://buildbot.python.org/#/builders/568/builds/8398 but I don't know how important they are.

@picnixz
Copy link
Member

picnixz commented Mar 3, 2025

@csm10495
Copy link
Contributor Author

csm10495 commented Mar 3, 2025

It seems like on some platforms, we're not getting negative signals as the exit code on terminate/kill.

Instead it gets a 255 which is odd. Though on a retry sometimes it passes, so it must be a race condition of some sort.

When I get a moment I can try to swap from using signals to verify to patching with a side effect of running the actual method.

@picnixz
Copy link
Member

picnixz commented Mar 3, 2025

I think we should first revert the commit for now because it's causing macOS builds to fail (for instance https://github.com/python/cpython/actions/runs/13636802329/job/38117531328?pr=121119) =/

@csm10495
Copy link
Contributor Author

csm10495 commented Mar 3, 2025

Is there a way to get more info about what about the ENV changed in the test? I'm not seeing how/why this could cause that but knowing which key changed may help tell more.

@picnixz
Copy link
Member

picnixz commented Mar 3, 2025

@picnixz
Copy link
Member

picnixz commented Mar 3, 2025

Wait. No, the issue is indeed the environment that changed.

@csm10495
Copy link
Contributor Author

csm10495 commented Mar 3, 2025

Yeah all i see is:

0:02:54 load avg: 3.52 [482/485/1] test.test_concurrent_futures.test_process_pool failed (env changed) -- running (1): test_ssl (31.5 sec)
test_force_shutdown_workers (test.test_concurrent_futures.test_process_pool.ProcessPoolForkProcessPoolExecutorTest.test_force_shutdown_workers) ... /Users/admin/actions-runner/_work/cpython/cpython/Lib/multiprocessing/popen_fork.py:67: DeprecationWarning: This process (pid=9970) is multi-threaded, use of fork() may lead to deadlocks in the child.
  self.pid = os.fork()
2.22s ok

@picnixz
Copy link
Member

picnixz commented Mar 3, 2025

Let's perhaps skip ignore this one which seems flaky. For instance it disappeared on my PR now. However, the buildbot failure is a true failure on some systems as it consistently fail on main

@colesbury
Copy link
Contributor

I am still seeing test_concurrent_futures.test_process_pool CI failures in various PRs. I think we should revert this for now.

colesbury added a commit to colesbury/cpython that referenced this pull request Mar 4, 2025
…ethods to ProcessPoolExecutor (pythonGH-128043)"

The test_concurrent_futures.test_process_pool test is failing in CI.

This reverts commit f97e409.
colesbury added a commit that referenced this pull request Mar 4, 2025
… to ProcessPoolExecutor (GH-128043)" (#130838)

The test_concurrent_futures.test_process_pool test is failing in CI.

This reverts commit f97e409.
seehwan pushed a commit to seehwan/cpython that referenced this pull request Apr 16, 2025
…o ProcessPoolExecutor (pythonGH-128043)

This adds two new methods to `multiprocessing`'s `ProcessPoolExecutor`:
- **`terminate_workers()`**: forcefully terminates worker processes using `Process.terminate()`
- **`kill_workers()`**: forcefully kills worker processes using `Process.kill()`

These methods provide users with a direct way to stop worker processes without `shutdown()` or relying on implementation details, addressing situations where immediate termination is needed.

Co-authored-by: Bénédikt Tran <[email protected]>
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
Commit-message-mostly-authored-by: Claude Sonnet 3.7 (because why not -greg)
seehwan pushed a commit to seehwan/cpython that referenced this pull request Apr 16, 2025
…ethods to ProcessPoolExecutor (pythonGH-128043)" (python#130838)

The test_concurrent_futures.test_process_pool test is failing in CI.

This reverts commit f97e409.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants