-
-
Notifications
You must be signed in to change notification settings - Fork 31.9k
test_multiprocessing_spawn.test_manager: _TestCondition hung (20 min timeout) on AMD64 RHEL8 3.x #110206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Similar error on PPC64LE RHEL8 Refleaks 3.x: https://buildbot.python.org/all/#/builders/384/builds/892 |
I failed to reproduce the issue on my Fedora 38 by stressing my laptop (12 logical CPUs) with:
I ran the test for 3 min 30 sec. |
I didn't see this failure recently, I close the issue. |
I've seen this a few times recently. For example on the GH ubuntu-24.04-arm runner: https://github.com/python/cpython/actions/runs/13642688178/job/38135745891?pr=130811
|
The test could deadlock trying join on the worker processes due to a combination of behaviors: * The use of `assertReachesEventually` did not ensure that workers actually woken.release() because the SyncManager's Semaphore does not implement get_value. * This mean that the test could finish and the variable "sleeping" would got out of scope and be collected. This unregisters the proxy leading to failures in the worker or possibly the manager. * The subsequent call to `p.join()` during cleanUp therefore never finished. This takes two approaches to fix this: 1) Use woken.acquire() to ensure that the workers actually finish calling woken.release() 2) Wait until the workers finish during the test, while cond, sleeping, and woken are still valid.
The test could deadlock trying join on the worker processes due to a combination of behaviors: * The use of `assertReachesEventually` did not ensure that workers actually woken.release() because the SyncManager's Semaphore does not implement get_value. * This mean that the test could finish and the variable "sleeping" would got out of scope and be collected. This unregisters the proxy leading to failures in the worker or possibly the manager. * The subsequent call to `p.join()` during cleanUp therefore never finished. This takes two approaches to fix this: 1) Use woken.acquire() to ensure that the workers actually finish calling woken.release() 2) At the end of the test, wait until the workers are finished, while `cond`, `sleeping`, and `woken` are still valid.
The test could deadlock trying join on the worker processes due to a combination of behaviors: * The use of `assertReachesEventually` did not ensure that workers actually woken.release() because the SyncManager's Semaphore does not implement get_value. * This mean that the test could finish and the variable "sleeping" would got out of scope and be collected. This unregisters the proxy leading to failures in the worker or possibly the manager. * The subsequent call to `p.join()` during cleanUp therefore never finished. This takes two approaches to fix this: 1) Use woken.acquire() to ensure that the workers actually finish calling woken.release() 2) At the end of the test, wait until the workers are finished, while `cond`, `sleeping`, and `woken` are still valid. (cherry picked from commit c476410) Co-authored-by: Sam Gross <[email protected]>
The test could deadlock trying join on the worker processes due to a combination of behaviors: * The use of `assertReachesEventually` did not ensure that workers actually woken.release() because the SyncManager's Semaphore does not implement get_value. * This mean that the test could finish and the variable "sleeping" would got out of scope and be collected. This unregisters the proxy leading to failures in the worker or possibly the manager. * The subsequent call to `p.join()` during cleanUp therefore never finished. This takes two approaches to fix this: 1) Use woken.acquire() to ensure that the workers actually finish calling woken.release() 2) At the end of the test, wait until the workers are finished, while `cond`, `sleeping`, and `woken` are still valid. (cherry picked from commit c476410) Co-authored-by: Sam Gross <[email protected]>
…30951) The test could deadlock trying join on the worker processes due to a combination of behaviors: * The use of `assertReachesEventually` did not ensure that workers actually woken.release() because the SyncManager's Semaphore does not implement get_value. * This mean that the test could finish and the variable "sleeping" would got out of scope and be collected. This unregisters the proxy leading to failures in the worker or possibly the manager. * The subsequent call to `p.join()` during cleanUp therefore never finished. This takes two approaches to fix this: 1) Use woken.acquire() to ensure that the workers actually finish calling woken.release() 2) At the end of the test, wait until the workers are finished, while `cond`, `sleeping`, and `woken` are still valid. (cherry picked from commit c476410) Co-authored-by: Sam Gross <[email protected]>
…30950) The test could deadlock trying join on the worker processes due to a combination of behaviors: * The use of `assertReachesEventually` did not ensure that workers actually woken.release() because the SyncManager's Semaphore does not implement get_value. * This mean that the test could finish and the variable "sleeping" would got out of scope and be collected. This unregisters the proxy leading to failures in the worker or possibly the manager. * The subsequent call to `p.join()` during cleanUp therefore never finished. This takes two approaches to fix this: 1) Use woken.acquire() to ensure that the workers actually finish calling woken.release() 2) At the end of the test, wait until the workers are finished, while `cond`, `sleeping`, and `woken` are still valid. (cherry picked from commit c476410) Co-authored-by: Sam Gross <[email protected]>
|
The test could deadlock trying join on the worker processes due to a combination of behaviors: * The use of `assertReachesEventually` did not ensure that workers actually woken.release() because the SyncManager's Semaphore does not implement get_value. * This mean that the test could finish and the variable "sleeping" would got out of scope and be collected. This unregisters the proxy leading to failures in the worker or possibly the manager. * The subsequent call to `p.join()` during cleanUp therefore never finished. This takes two approaches to fix this: 1) Use woken.acquire() to ensure that the workers actually finish calling woken.release() 2) At the end of the test, wait until the workers are finished, while `cond`, `sleeping`, and `woken` are still valid.
This frame comes from
_TestCondition
.AMD64 RHEL8 3.x:
test_multiprocessing_spawn.test_manager
passed when re-run.build: https://buildbot.python.org/all/#/builders/185/builds/5160
Linked PRs
The text was updated successfully, but these errors were encountered: