Skip to content

Conversation

cwoffenden
Copy link
Contributor

@cwoffenden cwoffenden commented Sep 15, 2025

ProcessAudio()behaves like it's running on the main thread, so spin locks are also blocking MainLoop() from running*. I removed the previous workaround of a counter in the AW process callback, used to only have interaction between the audio and main thread after a delay.

The previous main thread's code is now run in a worker, which still tests the spinlocks from the AW's side.

*it may also just be that the main thread is used to schedule the callbacks.

@cwoffenden
Copy link
Contributor Author

cwoffenden commented Sep 15, 2025

test_glgears_proxy_jstarget failed, the audio locks aren't. Running locally using --repeat 1000 with the original code I can get a failure eventually within the 1000 repetitions (and with also with repeated runs; within 10 repetitions on a 2-core VM), with the new code not, though I'll test more on a variety of hardware/OS combos.

@cwoffenden cwoffenden marked this pull request as ready for review September 15, 2025 18:23
@cwoffenden
Copy link
Contributor Author

5000+ runs and counting. Running in a 2-core VM I can make this PR just keep on going and the earlier code fail quickly.

runner

The original code either fails here, taking more than 10s to acquire a lock it should get in the next frame or so:

result = emscripten_lock_busyspin_wait_acquire(&testLock, 10000);

And only after timing out will the main thread release the lock, so during spinning the main thread must be blocked via the audio thread. This isn't the case with a worker.

The other place the original code fails is after calling emscripten_force_exit() the browser can hang. I can't get a stack trace here because the browser is unresponsive. This also doesn't fail in this PR.

@cwoffenden cwoffenden changed the title [AUDIO_WORKLETS] Fix race condition in locks test [AUDIO_WORKLETS] Move code off the main thread in locks test Sep 16, 2025
@cwoffenden cwoffenden marked this pull request as draft September 16, 2025 13:41
@cwoffenden
Copy link
Contributor Author

cwoffenden commented Sep 16, 2025

I'm still marking the test as flaky, I've had one failure on CircleCI but more than 30'000+ successes without a single failure locally. Trying again with a 1-CPU VM:

Ran 10000 tests in 7692.931s

@cwoffenden cwoffenden marked this pull request as ready for review September 16, 2025 16:47
@brendandahl
Copy link
Collaborator

I'm still marking the test as flaky, I've had one failure on CircleCI but more than 30'000+ successes without a single failure locally. Trying again with a 1-CPU VM:

That could be from some other audio worklet bug, I've seen test_audio_worklet_strict and test_audio_worklet_pthreads_es6 flake recently on CircleCI.

@sbc100
Copy link
Collaborator

sbc100 commented Sep 17, 2025

I've also seen plain old test_audio_worklet flake.

@juj
Copy link
Collaborator

juj commented Sep 18, 2025

This flake seems to be Chrome-specific btw - the flake does not occur on Firefox. I wonder if there could be a Chrome bug or improvement possible?

@cwoffenden
Copy link
Contributor Author

I saw a few failures with the earlier code after the emscripten_force_exit() call, which I've not been able to recreate with this PR (I must've done 60'000+ runs this week with variations such as _strict, etc., and on multiple machines). I think this exit failure may be related to #25270, in that a message is pushed via the main thread but the system is mid-shutdown so fails.

We're not running anything from Emscripten 4 in production yet (only in development), but what we've seen for years in the logs are errors when unloading the page. Usually some timeout call or worker is still running after the page is partially unloaded.

@cwoffenden
Copy link
Contributor Author

This flake seems to be Chrome-specific btw

If it is down to the worker's interaction with main, it might be why I see test_glgears_proxy_jstarget fail, for example:

https://app.circleci.com/pipelines/github/emscripten-core/emscripten/45690/workflows/b4d64825-b448-4839-b82b-5bf10942876e/jobs/1024231

I had this about every other run whilst trying to get all ticks for this PR.

@cwoffenden
Copy link
Contributor Author

I've also seen plain old test_audio_worklet flake.

test_audio_worklet_post_function also fails:

https://app.circleci.com/pipelines/github/emscripten-core/emscripten/45544/workflows/d8913a61-19da-4a00-af07-21b25712c1e7/jobs/1020135

But this isn't even doing any audio code, it's essentially a message passing worker.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants