feat: warmpool selection strategy#878
Conversation
|
Skipping CI for Draft Pull Request. |
✅ Deploy Preview for agent-sandbox canceled.
|
There was a problem hiding this comment.
Pull request overview
This PR adds configurable warm-pool sandbox adoption strategies, allowing SandboxTemplates to choose between the existing ready/oldest preference and a node-spread strategy for distributing adopted sandboxes.
Changes:
- Adds
adoptionStrategytoSandboxTemplateSpecand CRDs. - Adds
Sandbox.status.nodeNameand populates it from the underlying Pod. - Extends the warm sandbox queue and adoption logic to support strategy-based candidate selection, with unit tests.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
extensions/api/v1beta1/sandboxtemplate_types.go |
Adds the adoption strategy API enum and field. |
api/v1beta1/sandbox_types.go |
Adds nodeName to Sandbox status. |
controllers/sandbox_controller.go |
Updates Sandbox status with the scheduled node name. |
extensions/controllers/sandboxclaim_controller.go |
Implements strategy selection during warm-pool adoption. |
extensions/controllers/sandboxclaim_controller_test.go |
Adds tests for adoption strategy behavior. |
extensions/controllers/queue/simple_sandbox_queue.go |
Adds queue support for strategy-based popping. |
extensions/controllers/queue/simple_sandbox_queue_test.go |
Tests strategy-based queue removal. |
k8s/crds/extensions.agents.x-k8s.io_sandboxtemplates.yaml |
Regenerates SandboxTemplate CRD schema. |
k8s/crds/agents.x-k8s.io_sandboxes.yaml |
Regenerates Sandbox CRD schema. |
helm/crds/extensions.agents.x-k8s.io_sandboxtemplates.yaml |
Updates Helm CRD copy for SandboxTemplate. |
helm/crds/agents.x-k8s.io_sandboxes.yaml |
Updates Helm CRD copy for Sandbox. |
2a75b12 to
1b6ff48
Compare
1b6ff48 to
955f8d0
Compare
955f8d0 to
976e5eb
Compare
|
/ok-to-test |
976e5eb to
b2eb881
Compare
b2eb881 to
6c2ae22
Compare
ed0efb5 to
a3e4be4
Compare
dc8a08e to
29611a0
Compare
|
/retest |
29611a0 to
f21caba
Compare
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (11)
🚧 Files skipped from review as they are similar to previous changes (9)
📝 WalkthroughWalkthroughAdds a ChangesWarm Pool Adoption with Node Awareness
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
extensions/controllers/sandboxclaim_controller.go (1)
614-618: 💤 Low valueConsider wrapping the error with context for better error messages.
When
r.Listfails, the returned error lacks reconciler-specific context. While the error is logged immediately (line 616), wrapping it would improve error messages if it propagates up the call stack.📝 Suggested improvement
if err := r.List(ctx, &sandboxList, client.InNamespace(claim.Namespace)); err != nil { logger.Error(err, "Failed to list sandboxes for smart selection node counting") - return nil, queue.SandboxKey{}, err + return nil, queue.SandboxKey{}, fmt.Errorf("failed to list sandboxes in namespace %s for smart selection: %w", claim.Namespace, err) }As per coding guidelines, wrap errors with context to help downstream error handlers identify the failure point.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@extensions/controllers/sandboxclaim_controller.go` around lines 614 - 618, Wrap the error returned by r.List with reconciler-specific context before returning so callers get clearer failure info; e.g., when listing into v1beta1.SandboxList via r.List(ctx, &sandboxList, client.InNamespace(claim.Namespace)), create a wrapped error (using fmt.Errorf or errors.Wrapf) that includes the operation and identifying info such as the claim name/namespace, call logger.Error with that wrapped error, and return that wrapped error instead of the raw err (references: r.List, v1beta1.SandboxList, logger.Error, claim.Namespace, queue.SandboxKey{}).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@extensions/controllers/sandboxclaim_controller.go`:
- Around line 614-618: Wrap the error returned by r.List with
reconciler-specific context before returning so callers get clearer failure
info; e.g., when listing into v1beta1.SandboxList via r.List(ctx, &sandboxList,
client.InNamespace(claim.Namespace)), create a wrapped error (using fmt.Errorf
or errors.Wrapf) that includes the operation and identifying info such as the
claim name/namespace, call logger.Error with that wrapped error, and return that
wrapped error instead of the raw err (references: r.List, v1beta1.SandboxList,
logger.Error, claim.Namespace, queue.SandboxKey{}).
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c975b7c1-2ad1-4602-ae1d-5442d93c690b
📒 Files selected for processing (11)
api/v1beta1/sandbox_types.gocontrollers/sandbox_controller.gocontrollers/sandbox_controller_test.godocs/api.mdextensions/controllers/queue/simple_sandbox_queue.goextensions/controllers/queue/simple_sandbox_queue_test.goextensions/controllers/sandboxclaim_controller.goextensions/controllers/sandboxclaim_controller_test.gohelm/crds/agents.x-k8s.io_sandboxes.yamlk8s/crds/agents.x-k8s.io_sandboxes.yamltest/e2e/framework/predicates/sandbox.go
f21caba to
ed93960
Compare
updated based on copilot added test to sandbox controller lint fixed fixed e2e tests addressed Janet's comments autogent fixed docs addressed Barni's question updated docs revert capitalization: update addressed comments on fallback logic linter
ed93960 to
d027475
Compare
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: janetkuo, vicentefb The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What this PR does / why we need it:
This PR introduces a unified smart sandbox selection strategy for warm pool adoption. Based on review feedback, we eliminated the configurable
adoptionStrategyfield fromSandboxTemplateto simplify the user-facing API. The controller now automatically applies this smart strategy by default.Changes:
Sandbox API Update (
SandboxStatus.NodeName):nodeNamestring toSandboxStatusto track the name of the node where the underlying sandbox pod is scheduled. This allows controllers to inspect workload distribution dynamically.Smart Sandbox Selection Strategy (
smartSelector):nodeNamestatus field).Queue Self-Cleaning & Fallback:
key, trueto pop and discard it from the memory queue.r.ListAPI server errors immediately rather than silently falling back to a cold-start, ensuring correct controller requeuing.Thread-Safe Queue Pops with Retry Loop:
SimpleSandboxQueueandsynchronizedQueueto supportGetWithStrategyandPopWithStrategyoperations.PopWithStrategyto safely handle concurrency race conditions: if another goroutine pops the selected key during selection, the pop retries with a fresh snapshot rather than failing.Performance & API Alignment:
warmPoolNameinstead oftemplateHashto follow the v1beta1 API model. Partitioning the queues per warm pool prevents evaluating candidates from other warm pools.r.Getto save expensive Kubernetes API round-trips.extensions/api/v1beta1/sandboxtemplate_types.go, regenerateddocs/api.md, and updated all generated deepcopy files and CRDs.Unit Testing:
TestSmartSelectorinextensions/controllers/sandboxclaim_controller_test.gocovering readiness prioritization, node spreading, namespace filtering, and queue cleaning/deletion logic.Which issue(s) this PR is related to:
Fixes #491
Release Note
Generated by Overseer (powered by the gemini model).
Summary by CodeRabbit
New Features
Behavioral Improvements
Documentation
nodeNamefield.Tests