Fix hang in ServiceConsoleTests.serviceShutdown#673
Merged
Conversation
Collaborator
Author
|
@swift-ci test |
8677081 to
8a8a566
Compare
Collaborator
Author
|
@swift-ci test |
8a8a566 to
d59ebe0
Compare
Collaborator
Author
|
@swift-ci test |
48448c2 to
c95aeae
Compare
Collaborator
Author
|
@swift-ci test |
c95aeae to
996bcb1
Compare
This hang occurred only in CI environments and only on Linux. Here's the sequence of events: - Test terminates swbuild using SIGKILL - OS reparents SWBBuildService (a subprocess of swbuild) to launchd (Darwin) / init (others) - OS closes the file descriptors for the I/O pipes swbuild has connected to SWBBuildService - SWBBuildService's read() loop indicates EOF due to the broken pipe - SWBBuildService causes itself to exit At this point, the getpgid loop should return ERSCH and terminate the test. However, SWBBuildService is sticking around as a zombie for an extended period of time without init reaping the pid, causing getpgid to never hit the termination state. This causes the test to hang indefinitely. To fix this, there are two aspects: - A timeout is added around the termination monitoring loop that forces the exit promise to be fulfilled with an error if a 30-second interval elapses without the process exiting - We switch from using a getpgid loop to using a waitid loop, where the terminal state is that the process has _exited_... we don't care if the zombie hasn't been collected by init, only that it's not in a running state This fixes the hang for both the Jenkins based CI as well as GitHub actions, and also insulates us against future hangs by ensuring the test will terminate with a timeout error instead of hanging indefinitely, so that we at least know _which_ test is the problem.
996bcb1 to
d852b97
Compare
Collaborator
Author
|
@swift-ci test |
cmcgee1024
reviewed
Jul 28, 2025
| import SystemPackage | ||
| #endif | ||
|
|
||
| @Suite(.skipHostOS(.windows)) |
Member
There was a problem hiding this comment.
question: Will this cause the entire suite to be skipped on Windows, including individual test functions that aren't marked as skip for Windows?
Collaborator
Author
There was a problem hiding this comment.
Yes, but note that this isn't new code. Many of these were broken on Windows which is why they were skipped en masse. We should work on getting them passing.
neonichu
approved these changes
Jul 28, 2025
jakepetroules
added a commit
to jakepetroules/swift-build
that referenced
this pull request
Jul 29, 2025
Some of these were formerly skipped in GitHub actions, but are passing now. Likely the culprit was ServiceConsoleTests.serviceShutdown all along, which is fixed in swiftlang#673
jakepetroules
added a commit
that referenced
this pull request
Jul 30, 2025
Some of these were formerly skipped in GitHub actions, but are passing now. Likely the culprit was ServiceConsoleTests.serviceShutdown all along, which is fixed in #673
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This hang occurred only in CI environments and only on Linux. Here's the sequence of events:
At this point, the getpgid loop should return ERSCH and terminate the test. However, SWBBuildService is sticking around as a zombie for an extended period of time without init reaping the pid, causing getpgid to never hit the termination state. This causes the test to hang indefinitely.
To fix this, there are two aspects:
This fixes the hang for both the Jenkins based CI as well as GitHub actions, and also insulates us against future hangs by ensuring the test will terminate with a timeout error instead of hanging indefinitely, so that we at least know which test is the problem.