Skip to content

docs: Balanced consolidation policy RFC#2942

Merged
k8s-ci-robot merged 4 commits intokubernetes-sigs:mainfrom
jamesmt-aws:balanced-consolidation-clean
Apr 22, 2026
Merged

docs: Balanced consolidation policy RFC#2942
k8s-ci-robot merged 4 commits intokubernetes-sigs:mainfrom
jamesmt-aws:balanced-consolidation-clean

Conversation

@jamesmt-aws
Copy link
Copy Markdown
Contributor

@jamesmt-aws jamesmt-aws commented Apr 2, 2026

Summary

A new consolidationPolicy: Balanced that scores each consolidation move by comparing savings and disruption as fractions of NodePool totals. Moves where disruption outweighs savings are rejected.

  • consolidationPolicy is an IntOrString field. Balanced maps to k=2. Integer values (1-3) pass k directly as an escape hatch.
  • All policies expressed through the disruption cost model: WhenEmpty approves only when no pod contributes positive disruption cost, WhenEmptyOrUnderutilized is k=+inf
  • Per-node disruption cost of 1.0 eliminates division-by-zero edge cases
  • Score-based ranking replaces disruption-only ranking when budget limits move count
  • Exhaustive verification across c7i/m7i/r7i confirms k=2 is the smallest value that makes within-family REPLACEs viable

Related issues

aws#8868, aws#8536, aws#6642, aws#7146, #2319, #1019, #735, #1851, #2705, #2883, #1440, #1686, #1430, aws#5218, aws#3577

@linux-foundation-easycla
Copy link
Copy Markdown

linux-foundation-easycla Bot commented Apr 2, 2026

CLA Signed

The committers listed above are authorized under a signed CLA.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Apr 2, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Hi @jamesmt-aws. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Apr 2, 2026
@jamesmt-aws jamesmt-aws changed the title designs: Balanced consolidation policy RFC docs: Balanced consolidation policy RFC Apr 2, 2026
@jamesmt-aws jamesmt-aws force-pushed the balanced-consolidation-clean branch from 7b4dff4 to 592c2bd Compare April 2, 2026 16:26
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Apr 2, 2026
@jamesmt-aws
Copy link
Copy Markdown
Contributor Author

/easycla

Comment thread designs/balanced-consolidation.md Outdated
Comment thread designs/balanced-consolidation.md Outdated

This exists in all consolidation modes. The cost threshold concentrates the remaining moves onto higher-impact candidates. The system self-corrects: a nearly-empty replacement scores as a trivial DELETE next cycle. Cascades terminate because each round has strictly fewer displaced nodes.

Configuring kube-scheduler with `MostAllocated` scoring reduces divergence. The [Workload-Aware Scheduling proposal](https://docs.google.com/document/d/1mPYqS4cFmsHPaVQDKyCz7-TKyWNJGjTaZQD3Umkvmgk) (Kepka, Feb 2026) addresses this more directly.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw this doc isn't accessible. Presumably it's private or possibly a bad link?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me ask around for an updated link that I can send you on Kubernetes Slack. I think there was a lot of chatter at KubeCon about the path forward, and I don't really need the link. That will come up in other ways.

@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. and removed cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Apr 2, 2026
@coveralls
Copy link
Copy Markdown

Pull Request Test Coverage Report for Build 23920136107

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 34 unchanged lines in 2 files lost coverage.
  • Overall coverage decreased (-0.02%) to 80.404%

Files with Coverage Reduction New Missed Lines %
pkg/controllers/node/termination/controller.go 2 76.68%
pkg/state/cost/cost.go 32 79.51%
Totals Coverage Status
Change from base Build 23859923955: -0.02%
Covered Lines: 12178
Relevant Lines: 15146

💛 - Coveralls

@@ -0,0 +1,442 @@
# Balanced Consolidation: Scoring Moves by Savings and Disruption
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am super excited about this approach!

Comment thread designs/balanced-consolidation.md Outdated
spec:
disruption:
consolidationPolicy: Balanced
consolidationThreshold: 2.0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on exposing different Enum values that codify these numbers, rather than the number itself? Also -- note that json cannot support floats in a stable way.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call on the float. the formally motivated values are all integers. k=1 is break-even (deletes only, no replaces). k=2 is where within-family replaces become viable, 4-step max steps to stasis. k=3 adds 8 cross-family pairs with the same 4-step number of steps to stasis. at k=4 churn chains, jump to 9 steps and the formal analysis starts arguing against k=4 (or any higher value). so the natural set is {1, 2, 3} and I'll restrict the input type.

I thought about named presets but Karpenter doesn't have an ordinal enum pattern today, and picking names that age well is hard. "Conservative/Balanced/Aggressive" reuses "Balanced" which is already the policy name. I think the integer is simpler, but we can do whatever you and the rest of the community want to do here.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From an API design perspective, I am not sure that we need both knobs.

Right now, WhenEmpty: k=INF and WhenEmptyOrUnderutilized, k=0.

I see two approaches:

  1. Expand the enum that aliases other K values
  2. Expose a new consolidationThreshold that works when consolidationPolicy: WhenEmptyOrUnderutilized and simply changes the threshold.

cc: @jmdeal, @DerekFrank curious to your thoughts.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think your idea is better. The doc as-written assumes (maybe defensively) that can't justify k=2 uniquely for customers. If that's right, then we need k=3 and k=4, and then we might as well just make this parameter adjust the behavior of WhenEmptyOrUnderutilized. If we can uniquely justify k=2, then we can have a new enum.

I'm leaning towards making k a parameter of WhenEmptyOrUnderutilized based purely on this RFC. @jmdeal and @DerekFrank I'm happy to do whatever you two think is sensible, I'll try to catch up with you you today.

1. **Pod deletion cost** ([`controller.kubernetes.io/pod-deletion-cost`](https://kubernetes.io/docs/concepts/workloads/controllers/replicaset/#pod-deletion-cost)), divided by 2^27, range -16 to +16. Default 0. The ReplicaSet controller uses this for scale-down ordering; Karpenter reuses it as a disruption signal.
2. **Pod priority**, divided by 2^25, range -64 to +30. Default 0. Higher-priority pods increase their node's disruption cost.

With neither set, per-pod disruption cost is 1.0. `EvictionCost` clamps to [-10, 10]. The scoring path clamps negative values to 0 via `max(0, EvictionCost(pod))` in the per-node formula (see [NodePool Totals](#nodepool-totals)). Other consumers of `EvictionCost` (eviction ordering) still see negatives. Scoring range per pod: [0, 10].
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does a user communicate that disrupting a pod is free?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if users want their pods to have zero disruption cost, they can set pod-deletion-cost to a large negative value. that drives EvictionCost negative, and the disruption cost in this RFC clamps the result to 0, so the pod contributes nothing to the node's total disruption cost.

the node still has a disruption cost of 1. nodes have a disruption cost independent of their pods (cordoning, draining, API calls, replacement latency). we haven't modeled this precisely and cost=1 is a placeholder. I'll make that clearer in the design. for today it eliminates a divide-by-zero that comes up if node disruption cost is zero. it could be larger if we wanted, but we don't need that yet.

so a node where every pod has negative deletion cost scores the same as an empty node. cheap to disrupt, not free. if a user wants truly zero-friction disruption at a NodePool level, they want WhenEmptyOrUnderutilized (or k=+inf)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it -- so in our docs we would recommend setting it to -1 if you want to treat it as free.

@jamesmt-aws jamesmt-aws force-pushed the balanced-consolidation-clean branch from 6e41cd8 to 63468c3 Compare April 4, 2026 00:09
A new consolidationPolicy: Balanced that scores each consolidation move
by comparing savings and disruption as fractions of NodePool totals.
Moves where disruption outweighs savings are rejected.

- consolidationThreshold (integer, 1-3, default 2): a move passes when
  its disruption fraction is at most k times its savings fraction
- Per-node disruption cost of 1.0 eliminates division-by-zero edge cases
- Score-based ranking replaces disruption-only ranking when budget limits
  move count
- Exhaustive verification across c7i/m7i/r7i confirms k=2 is the
  smallest value that makes within-family REPLACEs viable
@jamesmt-aws jamesmt-aws force-pushed the balanced-consolidation-clean branch from 63468c3 to d89e204 Compare April 5, 2026 23:20
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Apr 5, 2026
@jamesmt-aws
Copy link
Copy Markdown
Contributor Author

squashed to fix EasyCLA (removed Co-Authored-By trailer that the bot couldn't resolve). no content changes beyond what was already pushed.

/easycla

jamesmt-aws added a commit to jamesmt-aws/karpenter that referenced this pull request Apr 7, 2026
Source: kubernetes-sigs#2942

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. and removed cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Apr 7, 2026
@jamesmt-aws jamesmt-aws force-pushed the balanced-consolidation-clean branch from 14a675d to a403fee Compare April 7, 2026 16:42
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Apr 7, 2026
jamesmt-aws added a commit to jamesmt-aws/karpenter that referenced this pull request Apr 10, 2026
Adds a new consolidationPolicy value, Balanced, that scores each
consolidation move and rejects moves where the disruption outweighs the
savings. Gated behind --feature-gates BalancedConsolidation=true.

The scoring formula compares savings and disruption as fractions of
NodePool totals: score = savings_fraction / disruption_fraction. A move
is approved when score >= 1/consolidationThreshold (default 2). The
scoring step is a filter inserted after scheduling feasibility and price
comparison. It can only reject moves, never create them. If scoring has
a bug that incorrectly approves, the move was already feasible and
cost-saving. If it incorrectly rejects, the cluster is less optimized
but not disrupted.

API:
- consolidationPolicy: Balanced (new enum value)
- consolidationThreshold: 1-3 (default 2, requires Balanced)

Implementation:
- balanced.go: scoring formula, NodePool totals, candidate pre-filter,
  cross-NodePool move evaluation
- Feature gate, API validation (CEL + runtime), defaulting
- ShouldDisrupt accepts Balanced, sets ConsolidationPolicyUnsupported
  status condition when gate is disabled
- Score-based candidate ranking for single-node consolidation
- Events (ConsolidationApproved/Rejected on Node+NodeClaim for
  single-node, NodePool for multi-node)
- Metrics (consolidation_score histogram, consolidation_moves_total
  counter)

Tests (31 new):
- 15 unit tests covering all RFC worked examples
- 9 integration tests (NodePool totals, cross-pool, candidate price)
- 3 feature gate tests
- 5 validation + 4 defaulting tests
- 4 score-based ranking tests
- 1 status condition test

See designs/balanced-consolidation.md (PR kubernetes-sigs#2942) for the full RFC.
jamesmt-aws added a commit to jamesmt-aws/karpenter that referenced this pull request Apr 11, 2026
Adds a new consolidationPolicy value, Balanced, that scores each
consolidation move and rejects moves where the disruption outweighs the
savings. Gated behind --feature-gates BalancedConsolidation=true.

The scoring formula compares savings and disruption as fractions of
NodePool totals: score = savings_fraction / disruption_fraction. A move
is approved when score >= 1/consolidationThreshold (default 2). The
scoring step is a filter inserted after scheduling feasibility and price
comparison. It can only reject moves, never create them. If scoring has
a bug that incorrectly approves, the move was already feasible and
cost-saving. If it incorrectly rejects, the cluster is less optimized
but not disrupted.

API:
- consolidationPolicy: Balanced (new enum value)
- consolidationThreshold: 1-3 (default 2, requires Balanced)

Implementation:
- balanced.go: scoring formula, NodePool totals, candidate pre-filter,
  cross-NodePool move evaluation
- Feature gate, API validation (CEL + runtime), defaulting
- ShouldDisrupt accepts Balanced, sets ConsolidationPolicyUnsupported
  status condition when gate is disabled
- Score-based candidate ranking for single-node consolidation
- Events (ConsolidationApproved/Rejected on Node+NodeClaim for
  single-node, NodePool for multi-node)
- Metrics (consolidation_score histogram, consolidation_moves_total
  counter)

Tests (31 new):
- 15 unit tests covering all RFC worked examples
- 9 integration tests (NodePool totals, cross-pool, candidate price)
- 3 feature gate tests
- 5 validation + 4 defaulting tests
- 4 score-based ranking tests
- 1 status condition test

See designs/balanced-consolidation.md (PR kubernetes-sigs#2942) for the full RFC.
jamesmt-aws added a commit to jamesmt-aws/karpenter that referenced this pull request Apr 11, 2026
Adds a new consolidationPolicy value, Balanced, that scores each
consolidation move and rejects moves where the disruption outweighs the
savings. Gated behind --feature-gates BalancedConsolidation=true.

The scoring formula compares savings and disruption as fractions of
NodePool totals: score = savings_fraction / disruption_fraction. A move
is approved when score >= 1/consolidationThreshold (default 2). The
scoring step is a filter inserted after scheduling feasibility and price
comparison. It can only reject moves, never create them. If scoring has
a bug that incorrectly approves, the move was already feasible and
cost-saving. If it incorrectly rejects, the cluster is less optimized
but not disrupted.

API:
- consolidationPolicy: Balanced (new enum value)
- consolidationThreshold: 1-3 (default 2, requires Balanced)

Implementation:
- balanced.go: scoring formula, NodePool totals, candidate pre-filter,
  cross-NodePool move evaluation
- Feature gate, API validation (CEL + runtime), defaulting
- ShouldDisrupt accepts Balanced, sets ConsolidationPolicyUnsupported
  status condition when gate is disabled
- Score-based candidate ranking for single-node consolidation
- Events (ConsolidationApproved/Rejected on Node+NodeClaim for
  single-node, NodePool for multi-node)
- Metrics (consolidation_score histogram, consolidation_moves_total
  counter)

Tests (31 new):
- 15 unit tests covering all RFC worked examples
- 9 integration tests (NodePool totals, cross-pool, candidate price)
- 3 feature gate tests
- 5 validation + 4 defaulting tests
- 4 score-based ranking tests
- 1 status condition test

See designs/balanced-consolidation.md (PR kubernetes-sigs#2942) for the full RFC.
jamesmt-aws added a commit to jamesmt-aws/karpenter that referenced this pull request Apr 11, 2026
Adds a new consolidationPolicy value, Balanced, that scores each
consolidation move and rejects moves where the disruption outweighs the
savings. Gated behind --feature-gates BalancedConsolidation=true.

The scoring formula compares savings and disruption as fractions of
NodePool totals: score = savings_fraction / disruption_fraction. A move
is approved when score >= 1/consolidationThreshold (default 2). The
scoring step is a filter inserted after scheduling feasibility and price
comparison. It can only reject moves, never create them. If scoring has
a bug that incorrectly approves, the move was already feasible and
cost-saving. If it incorrectly rejects, the cluster is less optimized
but not disrupted.

API:
- consolidationPolicy: Balanced (new enum value)
- consolidationThreshold: 1-3 (default 2, requires Balanced)

Implementation:
- balanced.go: scoring formula, NodePool totals, candidate pre-filter,
  cross-NodePool move evaluation
- Feature gate, API validation (CEL + runtime), defaulting
- ShouldDisrupt accepts Balanced, sets ConsolidationPolicyUnsupported
  status condition when gate is disabled
- Score-based candidate ranking for single-node consolidation
- Events (ConsolidationApproved/Rejected on Node+NodeClaim for
  single-node, NodePool for multi-node)
- Metrics (consolidation_score histogram, consolidation_moves_total
  counter)

Tests (31 new):
- 15 unit tests covering all RFC worked examples
- 9 integration tests (NodePool totals, cross-pool, candidate price)
- 3 feature gate tests
- 5 validation + 4 defaulting tests
- 4 score-based ranking tests
- 1 status condition test

See designs/balanced-consolidation.md (PR kubernetes-sigs#2942) for the full RFC.
jamesmt-aws added a commit to jamesmt-aws/karpenter that referenced this pull request Apr 11, 2026
Adds a new consolidationPolicy value, Balanced, that scores each
consolidation move and rejects moves where the disruption outweighs the
savings. Gated behind --feature-gates BalancedConsolidation=true.

The scoring formula compares savings and disruption as fractions of
NodePool totals: score = savings_fraction / disruption_fraction. A move
is approved when score >= 1/consolidationThreshold (default 2). The
scoring step is a filter inserted after scheduling feasibility and price
comparison. It can only reject moves, never create them. If scoring has
a bug that incorrectly approves, the move was already feasible and
cost-saving. If it incorrectly rejects, the cluster is less optimized
but not disrupted.

API:
- consolidationPolicy: Balanced (new enum value)
- consolidationThreshold: 1-3 (default 2, requires Balanced)

Implementation:
- balanced.go: scoring formula, NodePool totals, candidate pre-filter,
  cross-NodePool move evaluation
- Feature gate, API validation (CEL + runtime), defaulting
- ShouldDisrupt accepts Balanced, sets ConsolidationPolicyUnsupported
  status condition when gate is disabled
- Score-based candidate ranking for single-node consolidation
- Events (ConsolidationApproved/Rejected on Node+NodeClaim for
  single-node, NodePool for multi-node)
- Metrics (consolidation_score histogram, consolidation_moves_total
  counter)

Tests (31 new):
- 15 unit tests covering all RFC worked examples
- 9 integration tests (NodePool totals, cross-pool, candidate price)
- 3 feature gate tests
- 5 validation + 4 defaulting tests
- 4 score-based ranking tests
- 1 status condition test

See designs/balanced-consolidation.md (PR kubernetes-sigs#2942) for the full RFC.

### Candidate Filtering

Move generation is expensive (find a destination, compute replacement costs, verify scheduling). A node's best possible score is its delete ratio: a DELETE saving the full node cost with no replacement.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The replacement cost can diverge from actual reality, so we might end up getting a node without the expected savings?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true. Is that a problem with the balanced consolidation proposal, or is it a problem that is fundamental to consolidation and cost savings in Karpenter generally?

I'm happy to fix it if you think that's necessary, but that feels separable to me. What do you think? Are there any specific cases that we can talk through? This feels like a real problem, but the proposal here doesn't make that problem worse.

consolidationPolicy is now an IntOrString field per ellistarn's feedback.
Balanced maps to k=2. Integer values (1-3) pass k directly as an escape
hatch. Removes the separate consolidationThreshold field.
consolidationPolicy is now IntOrString. All policies expressed through
the disruption cost model:

- WhenEmpty: approve only when move disruption cost equals per-node
  disruption cost (no pod contributes positive disruption cost).
  Behavioral change from today: pods with large negative
  pod-deletion-cost no longer block consolidation.
- Balanced (k=2): scoring with default threshold
- Integer values (1-3): pass k directly as escape hatch
- WhenEmptyOrUnderutilized: k=+inf (any positive savings)

Removes separate consolidationThreshold field per ellistarn feedback.
@jamesmt-aws jamesmt-aws force-pushed the balanced-consolidation-clean branch from aa8d8f3 to 6d29106 Compare April 20, 2026 22:59
@grosser
Copy link
Copy Markdown
Contributor

grosser commented Apr 20, 2026

FYI since I'm not sure that is addressed, but the main problems that we are trying to solve is:

  • consolidate high waste nodes first POC
  • do not consolidate when not saving at least X% POC
  • some kind of "nodes killed per hour" limit

@jamesmt-aws
Copy link
Copy Markdown
Contributor Author

FYI since I'm not sure that is addressed, but the main problems that we are trying to solve is:

  • consolidate high waste nodes first POC
  • do not consolidate when not saving at least X% POC
  • some kind of "nodes killed per hour" limit

My goal here is to solve your problems with fewer user-facing controls. Basic idea is that we prioritize consolidation actions that target consolidation "moves" that have two properties. First, we should get a cost savings that looks pretty good. Second, we shouldn't disrupt a ton of pods. There's a simple estimation technique that we use to normalize these two factors to NodePool totals (more details in the RFC), so that a node with a lot of waste and small pod disruption is top priority, high waste/high disruption and low waste/low disruption are a rung down in priority, and low waste/high disruption moves are avoided entirely.

There's more detail in the RFC, and I'm happy to talk with you about particular cases on Kubernetes Slack, over a Zoom call, or over email. I think the one thing we're missing is the "node killed per-hour" limit, but I'm hoping that can also be an implicit effect of setting the disruption budget on your nodepool.

@ellistarn
Copy link
Copy Markdown
Contributor

x-posting from #2962 -- I should've posted on the RFC here:


Discussed offline.

Let's model this with consolidationPolicy: WhenEmpty | Balanced | WhenEmptyOrUnderutilized, with the option to expand that policy to support additional values in the future, with an escape hatch to pass k directly into that field.

Ready to approve once the RFC reflects this. cc: @jonathan-innis @DerekFrank @jmdeal

@grosser
Copy link
Copy Markdown
Contributor

grosser commented Apr 21, 2026

prioritizing the nodes with most waste sounds about right (need some formula for gpu waste > cpu waste > mem waste, I usually use factors of 1gpu=8cpu 1cpu=4GB mem)

the killed per hour is less important if all disruptions are "worth it" (saving at least x% for example)

@jamesmt-aws
Copy link
Copy Markdown
Contributor Author

Discussed offline.

Let's model this with consolidationPolicy: WhenEmpty | Balanced | WhenEmptyOrUnderutilized, with the option to expand that policy to support additional values in the future, with an escape hatch to pass k directly into that field.

Ready to approve once the RFC reflects this. cc: @jonathan-innis @DerekFrank @jmdeal

While working through some of the cases here, I thought about what it really means for this RFC to take into account the whole spectrum between WhenEmpty and WhenEmptyOrUnderutilized. There's a node disruption cost in the current RFC, defaulting to just 1 (the same as disrupting a single pod). That means WhenEmpty can be expressed as: approve only when move disruption cost equals the per-node disruption cost. No pod contributes positive disruption cost. WhenEmptyOrUnderutilized is k=+inf (any positive savings passes). The whole spectrum lives in one model.

This also surfaces a small behavioral improvement. Today's WhenEmpty checks for literally zero pods. Under the disruption cost framing, a node whose pods all have large negative pod-deletion-cost (disruption cost clamped to 0) also qualifies. Pods that declared themselves free to disrupt shouldn't block consolidation.

RFC is updated. Also making consolidationPolicy an IntOrString so integer values (1-3) pass k directly as the escape hatch.

@jamesmt-aws
Copy link
Copy Markdown
Contributor Author

@grosser Yeah, I completely agree. That's a resource-weighted cost concern. The RFC currently uses Karpenter's pricing model (dollar cost per node) for the savings side, so we're comparing node costs before/after proposed consolidation moes. This already implicitly weights GPUs higher because GPU instances cost more. It doesn't have an explicit resource-type weighting for disruption cost though, but if you see a pressing need for that I'm happy to add it.

Copy link
Copy Markdown
Contributor

@ellistarn ellistarn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 22, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ellistarn, jamesmt-aws

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 22, 2026
@k8s-ci-robot k8s-ci-robot merged commit 0de9964 into kubernetes-sigs:main Apr 22, 2026
14 checks passed
| `WhenEmpty` | Approve only when move disruption cost equals the per-node disruption cost (no pod contributes positive disruption cost) |
| `1` | Scoring with break-even threshold (deletes only, no replaces in uniform pools) |
| `Balanced` | Scoring with k=2 (within-family replaces viable) |
| `3` | Scoring with k=3 (adds cross-family replace pairs) |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One note. We could call this Conservative, Balanced, and Aggressive if we are only supporting 3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants