Skip to content

Conversation

@jakubno
Copy link
Member

@jakubno jakubno commented Dec 15, 2025

Note

Handle missing snapshot files with FailedPrecondition and refine placement to skip exhausted nodes and retry, adding supporting test helpers and tests.

  • Orchestrator server (packages/orchestrator/internal/server/sandboxes.go):
    • Distinguish snapshot resume vs fresh start when acquiring the starting-sandboxes semaphore (blocking with timeout for snapshots; non-blocking for starts).
    • Return FailedPrecondition when snapshot/template files are missing (storage.ErrObjectNotExist), with telemetry.
  • API placement (packages/api/internal/orchestrator/placement):
    • Simplify Algorithm interface (remove excludeNode).
    • On sandbox create: mark success immediately on nil error; on error, skip and retry another node for ResourceExhausted, otherwise exclude the node, record failure, and increment attempts.
  • Node manager test utilities (nodemanager/mock.go):
    • Add helpers to inject sandbox create errors, custom create behavior, and set a custom sandbox client.
  • Tests:
    • Unit test ensuring placement retries on ResourceExhausted and succeeds on another node.
    • Integration test asserting Create returns FailedPrecondition when sandbox files are not found.

Written by Cursor Bugbot for commit 3ee22ce. This will update automatically on new commits. Configure here.

@linear
Copy link

linear bot commented Dec 15, 2025

// and current load distribution.
type Algorithm interface {
chooseNode(ctx context.Context, nodes []*nodemanager.Node, nodesExcluded map[string]struct{}, requested nodemanager.SandboxResources, buildMachineInfo machineinfo.MachineInfo) (*nodemanager.Node, error)
excludeNode(err error) bool
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simplified the logic a little

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@jakubno jakubno requested a review from dobrac December 15, 2025 14:37
Copy link
Contributor

@dobrac dobrac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one nit

@dobrac dobrac added the bug Something isn't working label Dec 15, 2025
@jakubno jakubno merged commit fcece62 into main Dec 15, 2025
28 checks passed
@jakubno jakubno deleted the fix-issue-when-files-for-paused-sandbox-are-not-uploaded-yet-eng-3395 branch December 15, 2025 20:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants