Skip to content

Increase timeout for test_lock_contention on RISC-V #18430

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

Gui-Yue
Copy link
Contributor

@Gui-Yue Gui-Yue commented May 12, 2025

This PR addresses a test failure for tests.handlers.test_worker_lock.WorkerLockTestCase.test_lock_contention which consistently times out on the RISC-V (specifically riscv64) architecture.

The test simulates high lock contention and has a default timeout of 5 seconds, which seems sufficient for architectures like x86_64 but proves too short for current RISC-V hardware/environment performance characteristics, leading to spurious tests.utils.TestTimeout failures.

This fix introduces architecture detection using platform.machine(). If a RISC-V architecture is detected:

  • The timeout for this specific test is increased (e.g., to 15 seconds ).

The original, stricter timeout (5 seconds) and lock count (500) are maintained for all other architectures to avoid masking potential performance regressions elsewhere.

This change has been tested locally on RISC-V, where the test now passes reliably, and on x86_64, where it continues to pass with the original constraints.


Pull Request Checklist

  • Pull request is based on the develop branch (Assuming you based it correctly)
  • Pull request includes a changelog file. (See below)
  • Code style is correct (run the linters) (Please run linters locally)

@Gui-Yue Gui-Yue requested a review from a team as a code owner May 12, 2025 14:13
@CLAassistant
Copy link

CLAassistant commented May 12, 2025

CLA assistant check
All committers have signed the CLA.

Copy link
Member

@devonh devonh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have any way of testing that this indeed is a problem, or that it fixes the problem.
But the changes are a very minor test deviation so I see no problem pulling them if if they are helpful for others.

@Gui-Yue Gui-Yue force-pushed the raise_test_lock_contention_threshold_for_riscv branch 2 times, most recently from 03f39a9 to fc99dc3 Compare May 22, 2025 01:41
@Gui-Yue Gui-Yue requested a review from devonh May 22, 2025 01:46
@Gui-Yue Gui-Yue force-pushed the raise_test_lock_contention_threshold_for_riscv branch from fc99dc3 to 49fd9b4 Compare May 22, 2025 09:27
@Gui-Yue
Copy link
Contributor Author

Gui-Yue commented May 22, 2025

I noticed that lint test failed. The log shows that "tests/handlers/test_worker_lock.py:22:1: I001 Import block is un-sorted or un-formatted".Could you tell me what rules are followed in the order of python package imports? @devonh

@devonh
Copy link
Member

devonh commented May 22, 2025

I noticed that lint test failed. The log shows that "tests/handlers/test_worker_lock.py:22:1: I001 Import block is un-sorted or un-formatted".Could you tell me what rules are followed in the order of python package imports? @devonh

Hey! The best way to sort out these issues would be to run the lints yourself locally. This is described here: https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters
But basically just boils down to:

poetry run ./scripts-dev/lint.sh

Running that should catch all the same things that CI is looking at.

@realtyem
Copy link
Contributor

realtyem commented May 22, 2025

For a little context: this happens because the timeout is based on wall clock time. This will be flakey if run on lower end systems or systems with high load. I was able to pretty reliably simulate this on a x86_64 with CPU 8/16 when ran with -j48. I ended up bumping it up to 30 seconds, myself. Scratch that, I settled on 15 seconds.

@Gui-Yue
Copy link
Contributor Author

Gui-Yue commented May 26, 2025

could you please retrigger the tests? by the way, this issue was catched by debian, full log can be found in https://buildd.debian.org/status/fetch.php?pkg=matrix-synapse&arch=riscv64&ver=1.128.0-1&stamp=1746950558&raw=0. @devonh

@devonh
Copy link
Member

devonh commented May 26, 2025

I think it's unhappy with the newline at the end of the newsfile now. It's not the most robust tool. 🙃

@Gui-Yue Gui-Yue force-pushed the raise_test_lock_contention_threshold_for_riscv branch from 00db908 to 0d4aa54 Compare May 27, 2025 03:54
@Gui-Yue
Copy link
Contributor Author

Gui-Yue commented May 27, 2025

I think it's unhappy with the newline at the end of the newsfile now. It's not the most robust tool. 🙃

Because the file end with "space",I had not noticed.Now it has been fixed.Please retrigger the test. @devonh

@devonh devonh merged commit 07468a0 into element-hq:develop May 27, 2025
33 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants