docker: Fix stall when container stopped during creation #3769
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this pull request do? Explain your changes. (required)
This pull request introduces an early-exit mechanism during Docker container startup. It prevents the orchestrator from indefinitely waiting for a runner container to become available if it is killed or exits immediately after being launched, allowing for faster detection of failures and subsequent recovery/relaunch attempts.
Specific updates (required)
ai/worker/docker.go
: ModifieddockerWaitUntilRunning
to detect and return an error immediately if the container enters a terminal state (e.g.,exited
,dead
,removing
) or fails with a non-zero exit code or error without restarting.ai/worker/docker_test.go
: Added new unit tests to validate the fail-fast behavior for containers that areexited
,dead
, orcreated
with a non-zero exit code.How did you test each of these updates (required)
New unit tests were added to
ai/worker/docker_test.go
to specifically cover the fail-fast scenarios. These tests mock the Docker client to simulate container inspections returningexited
,dead
, orcreated
with a non-zero exit code. The tests assert thatdockerWaitUntilRunning
correctly returns an error in these situations, preventing an indefinite wait. All tests inai/worker
were run locally and passed.Does this pull request close any open issues?
Checklist:
make
runs successfully./test.sh
pass