diff --git a/designs/balanced-consolidation.md b/designs/balanced-consolidation.md
new file mode 100644
index 0000000000..b2cad2432a
--- /dev/null
+++ b/designs/balanced-consolidation.md
@@ -0,0 +1,445 @@
+# Balanced Consolidation: Scoring Moves by Savings and Disruption
+
+## Motivation
+
+Today, Karpenter's consolidation is all-or-nothing. `WhenEmptyOrUnderutilized` consolidates any node where pods can be repacked more cheaply, regardless of how little is saved or how many pods are disrupted. `WhenEmpty` consolidates only nodes with no pods. A move that saves $0.02/day by evicting a pod with a 30-minute warm-up cache is treated the same as a move that saves $5/day by evicting a stateless proxy.
+
+Terminating a non-empty node requires evicting running pods and starting replacements. Customers report cases where the disruption is not worth the savings. Related issues:
+
+- Nodes at 93-99% CPU utilization disrupted instead of lightly utilized ones ([aws#8868](https://github.com/aws/karpenter-provider-aws/issues/8868), [kubernetes-sigs#2319](https://github.com/kubernetes-sigs/karpenter/issues/2319))
+- Multi-hour consolidation loops replacing the same instance types with no net savings ([aws#8536](https://github.com/aws/karpenter-provider-aws/issues/8536), [aws#6642](https://github.com/aws/karpenter-provider-aws/issues/6642), [aws#7146](https://github.com/aws/karpenter-provider-aws/issues/7146))
+- Rapid node churn where consolidation deletes nodes that are immediately re-provisioned ([kubernetes-sigs#1019](https://github.com/kubernetes-sigs/karpenter/issues/1019), [kubernetes-sigs#735](https://github.com/kubernetes-sigs/karpenter/issues/735), [kubernetes-sigs#1851](https://github.com/kubernetes-sigs/karpenter/issues/1851))
+- `consolidateAfter` not preventing disruption of well-packed nodes ([kubernetes-sigs#2705](https://github.com/kubernetes-sigs/karpenter/issues/2705), [aws#3577](https://github.com/aws/karpenter-provider-aws/issues/3577))
+- Direct requests for a savings threshold or utilization-based consolidation gating ([kubernetes-sigs#2883](https://github.com/kubernetes-sigs/karpenter/issues/2883), [kubernetes-sigs#1440](https://github.com/kubernetes-sigs/karpenter/issues/1440), [kubernetes-sigs#1686](https://github.com/kubernetes-sigs/karpenter/issues/1686), [kubernetes-sigs#1430](https://github.com/kubernetes-sigs/karpenter/issues/1430), [aws#5218](https://github.com/aws/karpenter-provider-aws/issues/5218))
+
+This RFC calls each consolidation action a *move*. A move deletes one or more nodes, along with pod eviction and optional replacement node creation. We propose new `consolidationPolicy` values that score each move and reject moves where the disruption outweighs the savings. `Balanced` (k=2) is the recommended default. Integer values (1-3) provide direct control over the threshold.
+
+## Alternatives Considered
+
+Five approaches were considered and rejected. Each fails to account for something the scoring approach captures.
+
+### Cost Improvement Factor
+
+Require a minimum price improvement ratio (e.g., old_price / new_price >= 2). Considered for spot consolidation ([spot-consolidation.md](https://github.com/kubernetes-sigs/karpenter/blob/main/designs/spot-consolidation.md)). A move that saves 50% of a node's cost passes a 2x factor whether it disrupts 2 default-cost pods or 20 high-cost pods. The factor ignores disruption entirely.
+
+### Absolute Dollar Threshold
+
+Require savings to exceed a fixed dollar amount (e.g., $1/day) ([kubernetes-sigs#2883](https://github.com/kubernetes-sigs/karpenter/issues/2883), [kubernetes-sigs#1440](https://github.com/kubernetes-sigs/karpenter/issues/1440)). Two moves that each save $1/day and disrupt 4 default-cost pods: on a $50/day NodePool with 40 pods, this saves 2% of cost for 10% of disruption. On a $5,000/day NodePool with 4000 pods, the same $1 saves 0.02% for 0.1% of disruption. A $1 threshold approves both. The threshold does not scale with NodePool size.
+
+### Utilization-Based Threshold
+
+Exclude nodes above a resource utilization percentage (e.g., 70%) from consolidation, like CA's `scale-down-utilization-threshold` ([kubernetes-sigs#1686](https://github.com/kubernetes-sigs/karpenter/issues/1686), [aws#5218](https://github.com/aws/karpenter-provider-aws/issues/5218)). This is the most frequently requested alternative. A node at 40% utilization running one pod with `pod-deletion-cost: 2147483647` (a model-serving pod with a 2-hour warm-up cache) and a node at 40% running ten stateless pods both pass a 70% threshold. The utilization threshold cannot distinguish them.
+
+### Selective Consolidation Type Disable
+
+Disable single-node consolidation (replace) while keeping multi-node and emptiness consolidation ([kubernetes-sigs#1430](https://github.com/kubernetes-sigs/karpenter/issues/1430), [kubernetes-sigs#684](https://github.com/kubernetes-sigs/karpenter/issues/684), [PR #1433](https://github.com/kubernetes-sigs/karpenter/pull/1433)). An m7i.2xlarge ($9.68/day) running 2 pods requesting 2 vCPU total could replace to an m7i.large ($2.42/day), saving $7.26/day for 2 pods of disruption. Disabling replace blocks this move along with every other replace, regardless of the savings-to-disruption ratio.
+
+### Separate Disruption Cost Annotation
+
+A dedicated `karpenter.sh/disruption-cost` annotation separate from the existing `EvictionCost` inputs. This would let application developers independently control eviction ordering and consolidation gating. We prefer reusing existing parameters. `controller.kubernetes.io/pod-deletion-cost` and pod priority already express disruption cost. A separate annotation could be introduced later if eviction ordering and consolidation gating need to diverge.
+
+### Related Work
+
+- [PR #2562](https://github.com/kubernetes-sigs/karpenter/pull/2562): `ConsolidationPriceImprovementFactor` field (0.0-1.0) with operator-level default and NodePool override. Cost Improvement Factor with a different UX.
+- [PR #2893](https://github.com/kubernetes-sigs/karpenter/pull/2893): Decision ratio with a configurable `DecisionRatioThreshold` (default 1.0). Same scoring approach as this RFC but exposes the threshold from day one.
+- [PR #2901](https://github.com/kubernetes-sigs/karpenter/pull/2901): External health signal probes on NodePools that block disruption when a probe fails. Orthogonal and complementary.
+- [PR #2894](https://github.com/kubernetes-sigs/karpenter/pull/2894): Controller that automatically manages `controller.kubernetes.io/pod-deletion-cost` based on pluggable ranking strategies. Complementary.
+
+## Proposal
+
+### Proposed Spec
+
+```yaml
+apiVersion: karpenter.sh/v1
+kind: NodePool
+metadata:
+  name: default
+spec:
+  disruption:
+    consolidationPolicy: Balanced
+    consolidateAfter: 30s
+    budgets:
+    - nodes: 10%
+```
+
+`consolidationPolicy` is an `IntOrString` field. All values are expressed through the disruption cost model:
+
+| Value | Behavior |
+|---|---|
+| `WhenEmpty` | Approve only when move disruption cost equals the per-node disruption cost (no pod contributes positive disruption cost) |
+| `1` | Scoring with break-even threshold (deletes only, no replaces in uniform pools) |
+| `Balanced` | Scoring with k=2 (within-family replaces viable) |
+| `3` | Scoring with k=3 (adds cross-family replace pairs) |
+| `WhenEmptyOrUnderutilized` | Any positive savings (k=+inf) |
+
+`Balanced` is shorthand for k=2. An integer value uses the scoring formula directly with that k. `WhenEmptyOrUnderutilized` is equivalent to k=+inf (any move with positive savings passes). Validation rejects integers outside 1-3.
+
+`WhenEmpty` approves a move when its disruption cost equals the per-node disruption cost, meaning no pod on the candidate node has positive disruption cost. This is a behavioral change from today's `WhenEmpty`, which checks for literally zero pods. Under the new definition, a node whose pods all have large negative `pod-deletion-cost` (driving their disruption cost to 0 after clamping) qualifies as "empty" for consolidation purposes. This is an improvement: pods that declared themselves free to disrupt should not block consolidation.
+
+A move is approved when `score >= 1/k`. At the default `Balanced` (k=2), `score >= 0.5`.
+
+If an operator enables `BalancedConsolidation`, sets `consolidationPolicy: Balanced` (or an integer), then disables the feature gate during rollback, the controller falls back to `WhenEmptyOrUnderutilized` behavior and sets a `ConsolidationPolicyUnsupported` status condition on the NodePool. The condition message directs the operator to change the policy or re-enable the gate. This avoids reconcile failures while making the fallback visible.
+
+### How Scoring Works
+
+The score compares a move's savings and disruption as fractions of NodePool totals.
+
+#### Per-Pod Disruption Cost
+
+[`EvictionCost`](../pkg/utils/disruption/disruption.go) starts at 1.0 per pod and adds two terms:
+
+1. **Pod deletion cost** ([`controller.kubernetes.io/pod-deletion-cost`](https://kubernetes.io/docs/concepts/workloads/controllers/replicaset/#pod-deletion-cost)), divided by 2^27, range -16 to +16. Default 0. The ReplicaSet controller uses this for scale-down ordering; Karpenter reuses it as a disruption signal.
+2. **Pod priority**, divided by 2^25, range -64 to +30. Default 0. Higher-priority pods increase their node's disruption cost.
+
+With neither set, per-pod disruption cost is 1.0. `EvictionCost` clamps to [-10, 10]. The scoring path clamps negative values to 0 via `max(0, EvictionCost(pod))` in the per-node formula (see [NodePool Totals](#nodepool-totals)). Other consumers of `EvictionCost` (eviction ordering) still see negatives. Scoring range per pod: [0, 10].
+
+#### Default Behavior: Savings vs. Pod Count
+
+Most clusters today do not set `pod-deletion-cost`. When every pod has default disruption cost 1.0, the score reduces to savings-versus-pod-count. It still rejects marginal moves, but the ability to distinguish expensive-to-restart pods from cheap ones is inactive until operators or automation ([PR #2894](https://github.com/kubernetes-sigs/karpenter/pull/2894)) set per-pod costs.
+
+#### NodePool Totals
+
+The score normalizes savings and disruption against NodePool totals.
+
+```
+nodepool_cost = sum(node.price for node in nodepool.nodes)
+nodepool_total_disruption_cost = sum(node.disruption_cost for node in nodepool.nodes)
+```
+
+Each node's disruption cost is 1.0 (per-node cost) plus `sum(max(0, EvictionCost(pod)))` for its pods. Nodes have a disruption cost independent of their pods (cordoning, draining, API calls, replacement latency). We have not modeled this precisely and 1.0 is a placeholder. It eliminates division-by-zero for empty nodes and gives room to refine later (e.g., if GPU nodes should carry higher inherent disruption cost than CPU nodes). A 10-pod node has disruption cost 11; the per-node cost is 9% of the total. A node whose only pod has `EvictionCost = -10` has disruption cost 1.0, same as an empty node.
+
+For cross-NodePool moves, the source pool's totals, policy, and budget govern scoring. Cross-pool DELETEs require scheduling simulation to confirm pods fit on the destination. The destination pool's budget is not consumed (it absorbs pods, not loses a node). The source pool's policy governs regardless of the destination's policy; otherwise a permissive destination could override a conservative source.
+
+NodePool totals are snapshotted once per cycle. Later moves in the same cycle use stale totals. A move that scored 1.5 against a 100-node snapshot still passes against a 98-node pool. Moves near the boundary could flip; the next cycle corrects.
+
+#### Calculation
+
+```
+savings = sum(deleted_node.price) - sum(created_node.price)   # price = Karpenter's pricing model (on-demand, current spot, or on-demand/10M for ODCRs)
+disruption_cost = sum(max(0, EvictionCost(pod)) for pod in evicted_pods)
+
+savings_fraction = savings / nodepool_total_cost
+disruption_fraction = disruption_cost / nodepool_total_disruption_cost
+
+score = savings_fraction / disruption_fraction
+```
+
+`evicted_pods` is all pods on deleted source nodes. Every pod is evicted regardless of where it lands. The disruption cost counts every eviction.
+
+A move is approved when `score >= 1/k`, where k comes from `consolidationPolicy` (`Balanced` = 2, or an integer value directly). At k=2, `score >= 0.5`. Both sides are dimensionless fractions, so the score is scale-invariant (see [Scale Invariance](#scale-invariance)).
+
+**Division-by-zero handling.** With the per-node disruption cost of 1.0, `disruption_cost` is always positive for any node, eliminating the zero-disruption special case. The remaining edge cases:
+
+| savings | nodepool_total_cost | nodepool_total_disruption_cost | decision |
+|---|---|---|---|
+| positive | positive | positive | compute score normally |
+| zero | any | any | reject (no benefit) |
+| negative | any | any | reject (net loss) |
+| near-zero | near-zero (ODCR pool) | any | compute normally; the 1/10M divisor cancels and the score equals the on-demand price ratio |
+| any | any | zero | cannot happen (per-node disruption cost ensures > 0) |
+
+See [Edge Cases](#edge-cases) for worked examples.
+
+Feasibility checks (PDBs, `karpenter.sh/do-not-disrupt`, scheduling constraints) filter which moves can be generated. `consolidateAfter` determines which nodes are candidates. Scoring evaluates which feasible moves are worth executing. Disruption budgets gate how many execute per cycle.
+
+### Move Score as Ranking Function
+
+When multiple moves pass the threshold and a disruption budget limits how many execute, the score determines execution order. Today, `WhenEmptyOrUnderutilized` ranks by disruption alone (lowest first). Score-based ranking accounts for both savings and disruption. Greedy-by-ratio is the standard knapsack heuristic, [optimal for the continuous case](https://en.wikipedia.org/wiki/Continuous_knapsack_problem#Solution) and near-optimal for the discrete case when items are small relative to the budget.
+
+![Ranking consolidation moves: score vs. single-dimension ranking](ranking-strategies.png)
+
+The graphs show REPLACE and DELETE moves from a simulated cluster: 5000 pods with log-normal CPU/memory requests, packed across c7i/m7i/r7i instances, after 10 rounds of workload churn. Each curve shows cumulative savings vs. cumulative disruption under a different ranking strategy. At every disruption level, score-based ranking delivers more savings than the alternatives (see [`balanced-consolidation-ranking.py`](scripts/balanced-consolidation-ranking.py)).
+
+### Edge Cases
+
+#### Empty Nodes
+
+An empty node has disruption cost 1.0 (per-node cost only). A DELETE saves the full node cost against a small disruption fraction. Empty DELETEs always pass. No special case needed.
+
+#### Single-Node Pool
+
+A single-node pool DELETE scores exactly 1.0 (`savings_fraction = 1.0`, `disruption_fraction = 1.0`). This passes the default threshold. Deleted pods become unschedulable and trigger a new provisioning cycle. See [Open Questions](#open-questions) for whether the formula should have a floor on pool size.
+
+#### Near-Zero-Cost Nodes (ODCRs, Reserved Capacity)
+
+Karpenter prices ODCRs and reserved instances at on-demand price divided by a large constant, keeping the most expensive ODCR cheaper than the cheapest spot node (see the provider's pricing implementation for the exact divisor). Within an ODCR-only pool, the divisor cancels in the score (it divides both savings and total cost). Scores equal the on-demand price ratios, so ODCR pools consolidate normally. This is correct: replacing an expensive reservation with a cheaper one frees capacity. The divided values are small but well within float64 precision.
+
+When a positive-cost source node is consolidated and its pods land on an ODCR destination node, this is a DELETE from the source pool's perspective. The score reflects the source pool's cost structure. The destination node's near-zero cost does not affect the score.
+
+### Candidate Filtering
+
+Move generation is expensive (find a destination, compute replacement costs, verify scheduling). A node's best possible score is its delete ratio: a DELETE saving the full node cost with no replacement.
+
+```
+delete_ratio = (node.price / nodepool_cost) / (node.disruption_cost / nodepool_total_disruption_cost)
+```
+
+If this ratio is below 1/k, this node is not a good consolidation candidate. A DELETE saves the full node cost — a REPLACE saves strictly less because the replacement has positive cost. If the best case (DELETE) doesn't pass, nothing will. The system skips move generation for that node.
+
+This filter applies to single-node consolidation. A group of individually-failing nodes could produce a passing batch if their combined savings outweigh combined disruption. The filter misses these opportunities. Evaluating all multi-node combinations is exponential, so the implementation takes single-node savings first and attempts multi-node moves only when single-node opportunities are exhausted.
+
+The NodePool totals only need to be sensible relative to each other. The implementation may cache totals or estimate them from a subset of nodes, as long as cost and disruption are estimated from the same sample.
+
+### Interaction with Existing Features
+
+Existing feasibility checks (disruption budgets, PDBs, `consolidateAfter`, `do-not-disrupt`) are unchanged. Scoring applies only to consolidation, not to spot interruptions, expiration, or drift.
+
+**Consolidation cycling loops.** The scoring filter is the primary mechanism that breaks cycling loops ([aws#8536](https://github.com/aws/karpenter-provider-aws/issues/8536), [aws#6642](https://github.com/aws/karpenter-provider-aws/issues/6642), [aws#7146](https://github.com/aws/karpenter-provider-aws/issues/7146)). A same-type REPLACE that saves $0 scores 0 and is rejected at any threshold. A near-zero-savings move with meaningful disruption scores below 1/k and is also rejected. These are the moves that produce multi-hour loops under `WhenEmptyOrUnderutilized`, where any positive savings (including rounding-error savings) triggers a replace. The `consolidateAfter` cooldown on destination nodes (described below) provides a secondary damping effect between rounds but is not sufficient on its own to prevent cycling.
+
+`consolidateAfter` determines candidacy: a node whose last pod event is within the `consolidateAfter` window is not a candidate. Non-candidate nodes still contribute to the denominators, preventing the post-replacement cycle from being artificially aggressive. When a move executes, pods landing on the destination node reset its `consolidateAfter` timer, temporarily removing it from candidacy. This provides a natural cooldown between consolidation rounds.
+
+### Observability
+
+Approved and rejected moves are surfaced as events. Single-node moves emit on the NodeClaim. Multi-node moves emit on the NodePool (the score describes the move, not any single node).
+
+- `ConsolidationApproved`: `"score %.2f >= threshold %.2f (k: %d, savings %.1f%%, disruption %.1f%%)"`
+- `ConsolidationRejected`: `"score %.2f < threshold %.2f (k: %d, savings %.1f%%, disruption %.1f%%)"`
+
+Scored moves are also logged at DEBUG level.
+
+A Prometheus histogram `karpenter_consolidation_score` records scores by decision and NodePool, with buckets {0.1, 0.25, 0.33, 0.5, 1.0, 2.0, 5.0, 10.0}. A counter `karpenter_consolidation_moves_total` by decision and NodePool tracks move volume.
+
+How to use these: if `moves_total{decision="rejected"}` is high and the histogram shows most rejected scores clustered just below the threshold, the operator's k is slightly too conservative — raising it by 1 would capture those moves. If `moves_total{decision="approved"}` is zero and `moves_total{decision="rejected"}` is also zero, there are no candidates (the cluster is well-packed or `consolidateAfter` hasn't elapsed). If approved moves are high but savings aren't materializing, kube-scheduler divergence may be undoing the moves (check for rapidly churning NodeClaims).
+
+## Examples
+
+All examples use `consolidationPolicy: Balanced` (k=2). The NodePool has 10 nodes: eight m7i.xlarge (4 vCPU, 16 GiB, $4.84/day) and two m7i.2xlarge (8 vCPU, 32 GiB, $9.68/day). Total NodePool cost is $58.08/day. The NodePool runs 80 pods with total disruption cost 80.
+
+### Oversized Node (approved)
+
+One m7i.2xlarge runs 3 pods requesting 1.5 vCPU and 6 GiB total. Disruption cost is 3. These pods fit on an m7i.large at $2.42/day. Savings is $7.26.
+
+```
+savings_fraction = 7.26 / 58.08 = 12.5%
+disruption_fraction = 3 / 80 = 3.75%
+score = 0.125 / 0.0375 = 3.33 > 0.5 --> approved
+```
+
+### Spare Capacity Delete (approved)
+
+One m7i.xlarge runs 4 pods requesting 1.5 vCPU and 6 GiB. Disruption cost is 4. Another node has spare capacity. Savings is $4.84 (full node cost, no replacement needed).
+
+```
+savings_fraction = 4.84 / 58.08 = 8.3%
+disruption_fraction = 4 / 80 = 5.0%
+score = 0.083 / 0.05 = 1.67 > 0.5 --> approved
+```
+
+### Marginal Move (rejected)
+
+One m7i.xlarge runs 8 pods requesting 1.8 vCPU and 7 GiB. Disruption cost is 8. The pods fit on an m7i.large at $2.42/day. Savings is $2.42.
+
+```
+savings_fraction = 2.42 / 58.08 = 4.2%
+disruption_fraction = 8 / 80 = 10.0%
+score = 0.042 / 0.10 = 0.42 < 0.5 --> rejected
+```
+
+### Well-Packed Node (rejected)
+
+One m7i.xlarge runs 10 pods requesting 3.5 vCPU and 14 GiB. The smallest fitting replacement is another m7i.xlarge. Savings is $0. No threshold approves this move.
+
+### Uniform Pool Replace (approved)
+
+All 10 nodes are m7i.xlarge ($4.84/day each, $48.40/day total, 80 pods, disruption cost 80). One node's 8 pods fit on an m7i.large ($2.42/day). Savings is $2.42.
+
+```
+savings_fraction = 2.42 / 48.40 = 5.0%
+disruption_fraction = 8 / 80 = 10.0%
+score = 0.05 / 0.10 = 0.50 >= 0.5 --> approved
+```
+
+The replacement costs exactly half the original. In a uniform pool, this is the boundary at k=2: a replace that saves less than half the source node's cost is rejected. In a heterogeneous pool, the threshold depends on pool-level fractions, not per-node percentage. At k=1, the score simplifies to `1 - replacement_price / node_price`, which never reaches 1.0. k=2 is the smallest value that makes uniform-pool REPLACEs viable.
+
+### Scale Invariance
+
+The same oversized-node scenario on a 100-node NodePool ($580.80/day total cost, 800 total disruption cost) produces the same score:
+
+```
+savings_fraction = 7.26 / 580.80 = 1.25%
+disruption_fraction = 3 / 800 = 0.375%
+score = 0.0125 / 0.00375 = 3.33
+```
+
+The threshold produces the same decision regardless of NodePool size.
+
+### Heterogeneous Disruption Cost
+
+Two m7i.xlarge nodes each run 4 pods and can be deleted (pods fit on other nodes). Both save $4.84/day. Node A runs 4 stateless proxies with default disruption cost (total 4). Node B runs 1 stateless proxy (cost 1) and 3 model-serving pods with `pod-deletion-cost: 2147483647` (cost ~10 each, total 31). The NodePool total disruption cost is 107 (76 default-cost pods + node B's 31).
+
+**Node A (approved):**
+
+```
+savings_fraction = 4.84 / 58.08 = 8.3%
+disruption_fraction = 4 / 107 = 3.7%
+score = 0.083 / 0.037 = 2.24 > 0.5 --> approved
+```
+
+**Node B (rejected):**
+
+```
+savings_fraction = 4.84 / 58.08 = 8.3%
+disruption_fraction = 31 / 107 = 29.0%
+score = 0.083 / 0.29 = 0.29 < 0.5 --> rejected
+```
+
+Same savings, same node count, same pod count. The score rejects node B because the model-serving pods are expensive to restart. This is the score's main advantage over alternatives that ignore disruption cost: it distinguishes nodes where disruption is cheap from nodes where it is not.
+
+### Cross-NodePool: On-Demand and Spot
+
+A cluster has two NodePools. The On-Demand pool has 10 m7i.xlarge nodes at $4.84/day each ($48.40/day total, 80 pods, total disruption cost 80). The Spot pool has 10 m7i.xlarge nodes at $1.45/day each ($14.50/day total, 80 pods, total disruption cost 80).
+
+One node in each pool runs 3 pods requesting 1 vCPU. Disruption cost is 3. The pods can be absorbed by other nodes. This is a DELETE of the source node.
+
+**On-Demand pool DELETE:**
+
+```
+savings_fraction = 4.84 / 48.40 = 10.0%
+disruption_fraction = 3 / 80 = 3.75%
+score = 0.10 / 0.0375 = 2.67 > 0.5 --> approved
+```
+
+**Spot pool DELETE:**
+
+```
+savings_fraction = 1.45 / 14.50 = 10.0%
+disruption_fraction = 3 / 80 = 3.75%
+score = 0.10 / 0.0375 = 2.67 > 0.5 --> approved
+```
+
+Both moves score identically because each node represents the same fraction of its pool's cost and disrupts the same fraction of its pool's pods.
+
+If the Spot pool node instead runs 8 pods with disruption cost 8:
+
+```
+savings_fraction = 1.45 / 14.50 = 10.0%
+disruption_fraction = 8 / 80 = 10.0%
+score = 0.10 / 0.10 = 1.0 > 0.5 --> approved
+```
+
+The move is approved. To reach the 0.5 boundary, each of the 8 pods would need disruption cost 2 (disruption fraction 20%, score 0.50).
+
+## Why k=2
+
+The scoring formula has one free parameter: k (exposed via `consolidationPolicy`). We chose k=2 as the default by exhaustive enumeration.
+
+### State Space
+
+We enumerate configurations in a bounded space using on-demand prices from three instance families in us-east-1: c7i (compute-optimized), m7i (general-purpose), and r7i (memory-optimized), medium through 4xlarge (15 price points). 1 to 6 nodes per pool, 0 to 4 pods per node, per-pod disruption cost in {1, 2, 5, 10} (max per-node disruption cost: 4 pods x 10 = 40). For each configuration, we evaluate every candidate move (Delete and Replace to every cheaper price) at every k from 1 through 5. Three families matter because cross-family replacement ratios are not power-of-2.
+
+Properties 1-4, 6, and 7 are algebraic properties of the ratio formula and hold for any pod count — the enumeration is a sanity check, not the proof. Properties 5 and 8 are empirical results bounded by the enumerated price structure and pod counts.
+
+### What a good scoring function does
+
+Eight properties define what we want from the scoring formula. The first seven are about the formula itself. The eighth is about what happens over time: a REPLACE creates a new node, and that node might itself be a consolidation candidate. An m7i.xlarge replaces to m7i.large, which replaces to c7i.medium, and so on. We call this sequence a **churn chain**. A good scoring function produces churn chains that converge (reach a stable instance type) rather than cycle.
+
+1. **Monotonicity in savings.** Cheaper replacement never makes approval harder.
+2. **Monotonicity in disruption.** Higher disruption never makes approval easier.
+3. **Empty nodes always deletable.** Zero disruption, positive savings: always approved.
+4. **Zero-savings moves never approved.** Same-price replacement scores zero.
+5. **Replaces work in uniform pools.** The minimum useful k is the smallest value where meaningful replaces pass.
+6. **Skewed disruption differentiates.** High-disruption pods make their node harder to approve.
+7. **Fleet size independence.** Pool size cancels algebraically in uniform pools.
+8. **Bounded churn.** Churn chains converge and terminate quickly.
+
+Properties 1-4, 6, and 7 hold at all k values (structural properties of the ratio). Properties 5 and 8 select k.
+
+### Results
+
+| k | min score (1/k) | approved replace pairs | new pairs vs k-1 | max churn chain |
+|---|-----------------|----------------------|-----------------|-----------------|
+| 1 | 1.000 | 0 | — | 0 |
+| 2 | 0.500 | 78 | 78 | 4 steps |
+| 3 | 0.333 | 86 | 8 | 4 steps |
+| 4 | 0.250 | 95 | 9 | 9 steps |
+| 5 | 0.200 | 100 | 5 | 9 steps |
+
+At k=1, no replace is ever approved in a uniform pool. The score for a uniform-pool replace simplifies to `1 - replacement_price / node_price`, which requires a free replacement to reach 1.0.
+
+k=2 is the smallest integer where uniform-pool REPLACEs pass. Within a single family, prices follow power-of-2 scaling, so every replacement ratio is 0.5 or less and k>=3 adds nothing. Across families, k=3 opens 8 additional cross-family pairs (e.g., c7i.large → m7i.medium at 43% savings, score 0.43) without increasing the max churn chain. k=4 opens 9 more pairs but allows 9-step churn chains that zigzag through all three families. Max chain lengths are confirmed by exhaustive DFS over all approved replacement paths, not a greedy heuristic (see [`balanced-consolidation-properties.py`](scripts/balanced-consolidation-properties.py)).
+
+k=2 is the right default. It is the smallest value that makes within-family REPLACEs viable, and it captures all cross-family pairs where the replacement costs less than half the original. The 8 additional cross-family pairs at k=3 are available to operators who set `consolidationPolicy: 3` (see [`balanced-consolidation-properties.py`](scripts/balanced-consolidation-properties.py)).
+
+## API Choices
+
+### Consolidation Aggressiveness Tuning [Recommended: IntOrString consolidationPolicy]
+
+`consolidationPolicy` is an `IntOrString` field. `Balanced` maps to k=2. Integer values (1-3) pass k directly. This gives most users a single named value while preserving an escape hatch for operators who need a different threshold.
+
+The formally motivated values are all integers: k=1 (break-even, deletes only), k=2 (within-family replaces viable, 4-step max churn), k=3 (adds cross-family pairs, same 4-step churn). At k=4, churn chains jump to 9 steps and the formal analysis argues against higher values.
+
+Two alternatives were considered:
+
+**Separate `consolidationThreshold` field.** A dedicated integer field alongside `consolidationPolicy: Balanced`. This works but adds a field that only applies to one policy value. Folding k into `consolidationPolicy` keeps the API surface smaller.
+
+**Named presets (Low/Medium/High).** A `consolidationAggressiveness` enum mapping to k values. Karpenter does not have an existing ordinal enum pattern, and picking names that age well is hard. "Conservative/Balanced/Aggressive" reuses "Balanced" which is already the policy name. IntOrString is simpler.
+
+### Per-NodePool vs. Per-Cluster Normalization [Recommended: Per-NodePool]
+
+The score denominators (total cost, total disruption cost) could be computed per-NodePool or across the entire cluster. This proposal recommends per-NodePool.
+
+Per-NodePool normalization matches Karpenter's existing architecture (policies, budgets, `consolidateAfter` are already per-NodePool). Per-cluster normalization would dilute small pool scores: a single node representing 10% of its own pool becomes 0.1% of the cluster.
+
+Con: scores reflect relative efficiency within a pool, not absolute dollar impact. A score of 2.0 in a $50/hr pool and 2.0 in a $10,000/hr pool look identical. This does not affect behavior but limits cross-pool comparison.
+
+### consolidationPolicy Naming
+
+No behavior change for existing users. Migration: change `WhenEmptyOrUnderutilized` to `Balanced`.
+
+Con: `WhenEmpty` and `WhenEmptyOrUnderutilized` describe behavior; `Balanced` describes character. Alternatives: `Scored` (exposes implementation), `CostWeighted` (implies cost-only), `WhenWorthIt` (too informal). `Balanced` is the least-bad option.
+
+## Backward Compatibility
+
+- `WhenEmptyOrUnderutilized` and `WhenEmpty` are unchanged. Existing NodePool specs continue to work.
+- Per-pod disruption cost is computed from the existing `controller.kubernetes.io/pod-deletion-cost` annotation and pod priority via `EvictionCost`. Pods without these inputs default to disruption cost 1.0, matching current behavior (all pods are equal).
+
+## Open Questions
+
+- **Is k=2 the right default?** [Why k=2](#why-k2) explains why k=2 is the smallest value that makes uniform-pool REPLACEs viable. We do not know whether k=2 works well across diverse workloads. The feature gate and opt-in rollout exist to answer this empirically.
+
+- **Should single-node pools be exempt from scoring?** A single-node pool DELETE scores 1.0 and passes, making all pods unschedulable until provisioning creates a replacement. Disruption budgets (`nodes: 0`) prevent this today. The formula could refuse by requiring pool size > 1, but that adds a special case for something disruption budgets already handle.
+
+- **How many customers use `pod-deletion-cost` today?** If few do, the score reduces to savings-versus-pod-count. [PR #2894](https://github.com/kubernetes-sigs/karpenter/pull/2894) would automate annotation.
+
+- **Move quality tracking.** Annotate moved pods with the move's score. Track re-disruption rate (pods evicted again within minutes). High re-disruption means the threshold is too aggressive.
+
+- **Async move generation.** Scoring enables a shift from synchronous per-cycle consolidation to a continuous pipeline: generate, score, enqueue by priority, execute as budget allows. This eliminates the per-cycle timeout problem.
+
+
+## Frequently Asked Questions
+
+### What happens in a uniformly inefficient cluster where no single REPLACE clears the threshold?
+
+Every REPLACE scores low (e.g., 0.2 for a one-tier downsize) and is rejected. DELETEs still work: a DELETE of a node whose pods fit elsewhere scores 1.0 in a uniform pool. The system consolidates by deleting nodes, not replacing them. Operators who want all feasible moves can use `WhenEmptyOrUnderutilized`.
+
+### Does the score account for kube-scheduler pod placement?
+
+No. The score assumes pods land on the intended destination. In practice, kube-scheduler may scatter pods across existing nodes instead of packing onto the replacement. The replacement ends up nearly empty and becomes a consolidation candidate itself.
+
+This exists in all consolidation modes. The cost threshold concentrates the remaining moves onto higher-impact candidates. The system self-corrects: a nearly-empty replacement scores as a trivial DELETE next cycle. Cascades terminate because each round has strictly fewer displaced nodes.
+
+Configuring kube-scheduler with `MostAllocated` scoring reduces divergence.
+
+### Why doesn't the score account for reserved instance or ODCR opportunity cost?
+
+See [Near-Zero-Cost Nodes](#near-zero-cost-nodes-odcrs-reserved-capacity). The 1/10M factor cancels in the score, so ODCR pools consolidate using on-demand price ratios. Real opportunity cost (what freed capacity is worth to the organization) is a different question that requires billing API integration. This RFC defers opportunity-cost modeling.
+
+### Where is the score visible?
+
+See [Observability](#observability). DEBUG logs, NodeClaim events, and a Prometheus histogram.
+
+### Will this become the default consolidation policy?
+
+Not at launch. `Balanced` is opt-in behind a feature gate. Whether it becomes the default is a community decision deferred to GA graduation.
+
+### Does constraining maximum node size improve this proposal?
+
+Rejections are driven by high disruption fraction relative to savings fraction. A 50-pod node in a 500-pod pool has 10% disruption fraction regardless of instance type. Constraining maximum node size reduces the per-node share of pool disruption, making moves easier to approve. The formula works correctly regardless of node size. Operators can manage node size through NodePool instance-type constraints today.
+
+## Rollout
+
+Standard feature gate pattern (SpotToSpotConsolidation, NodeOverlay).
+
+- **Alpha**: disabled by default. `--feature-gates BalancedConsolidation=true` to opt in.
+- **Beta**: enabled by default. Disable if issues arise.
+- **GA**: gate removed. Whether `Balanced` becomes the default policy is a separate decision.
diff --git a/designs/ranking-strategies.png b/designs/ranking-strategies.png
new file mode 100644
index 0000000000..0a40673087
Binary files /dev/null and b/designs/ranking-strategies.png differ
diff --git a/designs/scripts/balanced-consolidation-properties.py b/designs/scripts/balanced-consolidation-properties.py
new file mode 100644
index 0000000000..f792a0a81d
--- /dev/null
+++ b/designs/scripts/balanced-consolidation-properties.py
@@ -0,0 +1,415 @@
+#!/usr/bin/env python3
+"""
+Exhaustive check of consolidation scoring properties.
+
+Decision rule: approved iff k * savings_fraction >= disruption_fraction
+Cross-multiplied: k * savings * pool_disruption >= move_disruption * pool_cost
+
+State space:
+  - 1-6 nodes
+  - Prices from c7i, m7i, r7i on-demand (us-east-1), scaled to integers
+  - 0-4 pods per node, disruption cost in {1, 2, 5, 10}
+  - Actions: Delete(node), Replace(node, new_price)
+  - k: the decision constant (1 = break-even, 2 = savings count 2x)
+"""
+
+from itertools import product as cartesian
+from collections import defaultdict
+
+# On-demand prices in us-east-1, $/hr × 10000 for exact integer arithmetic.
+# c7i (compute-optimized)
+#   c7i.medium  $0.0446/hr  →  446
+#   c7i.large   $0.0892/hr  →  892
+#   c7i.xlarge  $0.1785/hr  → 1785
+#   c7i.2xlarge $0.3570/hr  → 3570
+#   c7i.4xlarge $0.7140/hr  → 7140
+# m7i (general-purpose)
+#   m7i.medium  $0.0504/hr  →  504
+#   m7i.large   $0.1008/hr  → 1008
+#   m7i.xlarge  $0.2016/hr  → 2016
+#   m7i.2xlarge $0.4032/hr  → 4032
+#   m7i.4xlarge $0.8064/hr  → 8064
+# r7i (memory-optimized)
+#   r7i.medium  $0.0661/hr  →  661
+#   r7i.large   $0.1323/hr  → 1323
+#   r7i.xlarge  $0.2646/hr  → 2646
+#   r7i.2xlarge $0.5292/hr  → 5292
+#   r7i.4xlarge $1.0584/hr  → 10584
+PRICES = sorted([
+    446, 892, 1785, 3570, 7140,     # c7i
+    504, 1008, 2016, 4032, 8064,    # m7i
+    661, 1323, 2646, 5292, 10584,   # r7i
+])
+DCOSTS = [1, 2, 5, 10]
+K_VALUES = [1, 2, 3]
+
+
+def approved(k, savings, move_disruption, pool_cost, pool_disruption):
+    """Cross-multiplied decision rule. Handles zero-disruption edge case."""
+    if move_disruption == 0:
+        return savings > 0
+    if pool_disruption == 0:
+        return savings > 0
+    if pool_cost == 0:
+        return False
+    return k * savings * pool_disruption >= move_disruption * pool_cost
+
+
+def score(savings, move_disruption, pool_cost, pool_disruption):
+    """Actual score (float). For reporting only, not for decisions."""
+    if pool_cost == 0 or pool_disruption == 0:
+        return float('inf') if savings > 0 else 0.0
+    sf = savings / pool_cost
+    df = move_disruption / pool_disruption
+    if df == 0:
+        return float('inf') if sf > 0 else 0.0
+    return sf / df
+
+
+# ─── P1: Monotonicity in savings ───
+
+def check_p1():
+    """If replace(n, q) is approved and p < q (more savings), then replace(n, p) is approved."""
+    violations = []
+    for k in K_VALUES:
+        for n_nodes in range(1, 5):
+            for node_price in PRICES:
+                for pod_config in pod_configs(max_pods=4):
+                    move_disruption = sum(pod_config)
+                    # Other nodes: try a few representative configs
+                    for other_price in PRICES[:3]:
+                        for other_pods in [0, 2, 4]:
+                            other_disruption = other_pods * 1  # default cost
+                            pool_cost = node_price + (n_nodes - 1) * other_price
+                            pool_disruption = move_disruption + (n_nodes - 1) * other_disruption
+                            for q in PRICES:
+                                if q >= node_price:
+                                    continue
+                                savings_q = node_price - q
+                                if not approved(k, savings_q, move_disruption, pool_cost, pool_disruption):
+                                    continue
+                                # q is approved. Check all p < q.
+                                for p in PRICES:
+                                    if p >= q:
+                                        continue
+                                    savings_p = node_price - p
+                                    if not approved(k, savings_p, move_disruption, pool_cost, pool_disruption):
+                                        violations.append((k, node_price, q, p, n_nodes))
+    return violations
+
+
+# ─── P2: Monotonicity in disruption ───
+
+def check_p2():
+    """Higher disruption on a node should make approval at least as hard."""
+    violations = []
+    for k in K_VALUES:
+        for n_nodes in range(2, 5):
+            for price in PRICES:
+                for other_price in PRICES[:3]:
+                    for other_pods in [0, 2, 4]:
+                        other_disruption = other_pods * 1
+                        base_pool_cost = price + (n_nodes - 1) * other_price
+                        base_other_disruption = (n_nodes - 1) * other_disruption
+                        for d_high in range(1, 41):
+                            for d_low in range(0, d_high):
+                                pool_d_high = d_high + base_other_disruption
+                                pool_d_low = d_low + base_other_disruption
+                                # Delete savings = price for both
+                                app_high = approved(k, price, d_high, base_pool_cost, pool_d_high)
+                                app_low = approved(k, price, d_low, base_pool_cost, pool_d_low)
+                                if app_high and not app_low:
+                                    violations.append((k, price, d_high, d_low, n_nodes))
+    return violations
+
+
+# ─── P3: Empty nodes always approved for delete ───
+
+def check_p3():
+    """Node with 0 pods, positive price: delete always approved."""
+    violations = []
+    for k in K_VALUES:
+        for n_nodes in range(1, 7):
+            for price in PRICES:
+                for other_price in PRICES:
+                    for other_pods in range(0, 5):
+                        other_disruption = other_pods * 1
+                        pool_cost = price + (n_nodes - 1) * other_price
+                        pool_disruption = 0 + (n_nodes - 1) * other_disruption
+                        if not approved(k, price, 0, pool_cost, pool_disruption):
+                            violations.append((k, price, n_nodes))
+    return violations
+
+
+# ─── P4: Zero-savings moves never approved ───
+
+def check_p4():
+    """Replace with same price: savings = 0, never approved."""
+    violations = []
+    for k in K_VALUES:
+        for price in PRICES:
+            for pod_config in pod_configs(max_pods=4):
+                move_disruption = sum(pod_config)
+                for n_nodes in range(1, 5):
+                    pool_cost = n_nodes * price
+                    pool_disruption = n_nodes * move_disruption
+                    if approved(k, 0, move_disruption, pool_cost, pool_disruption):
+                        violations.append((k, price, pod_config))
+    return violations
+
+
+# ─── P5: Single-node pool replaces ───
+
+def check_p5():
+    """For each k, find all approved replaces in single-node pools."""
+    results = defaultdict(list)
+    for k in K_VALUES:
+        for price in PRICES:
+            for pod_config in pod_configs(max_pods=4):
+                if not pod_config:
+                    continue  # empty node, trivial
+                move_disruption = sum(pod_config)
+                pool_cost = price
+                pool_disruption = move_disruption
+                for np in PRICES:
+                    if np >= price:
+                        continue
+                    savings = price - np
+                    if approved(k, savings, move_disruption, pool_cost, pool_disruption):
+                        s = score(savings, move_disruption, pool_cost, pool_disruption)
+                        results[k].append({
+                            'price': price, 'replacement': np,
+                            'pods': pod_config, 'disruption': move_disruption,
+                            'savings_pct': 100 * savings / price,
+                            'score': s,
+                        })
+    return results
+
+
+# ─── P6: Skewed disruption cost ───
+
+def check_p6():
+    """Find pools where high-disruption node is rejected but low-disruption is approved."""
+    examples = []
+    for k in [1, 2, 3]:
+        for n_nodes in range(2, 5):
+            for price in PRICES:
+                for d_high_pods in pod_configs(max_pods=4):
+                    for d_low_pods in pod_configs(max_pods=4):
+                        d_high = sum(d_high_pods)
+                        d_low = sum(d_low_pods)
+                        if d_high <= d_low or d_low == 0:
+                            continue
+                        other_disruption = (n_nodes - 1) * 4 * 1  # other nodes: 4 default pods
+                        pool_cost = n_nodes * price
+                        # Both nodes are in the pool
+                        pool_disruption = d_high + d_low + (n_nodes - 2) * 4 * 1 if n_nodes > 2 else d_high + d_low
+                        app_high = approved(k, price, d_high, pool_cost, pool_disruption)
+                        app_low = approved(k, price, d_low, pool_cost, pool_disruption)
+                        if app_low and not app_high:
+                            examples.append({
+                                'k': k, 'price': price, 'n_nodes': n_nodes,
+                                'high_pods': d_high_pods, 'low_pods': d_low_pods,
+                                'd_high': d_high, 'd_low': d_low,
+                            })
+                            if len(examples) >= 3:
+                                return examples
+    return examples
+
+
+# ─── P7: Fleet size independence ───
+
+def check_p7():
+    """For uniform pools, check that the decision doesn't change with pool size."""
+    violations = []
+    for k in K_VALUES:
+        for price in PRICES:
+            for pod_config in pod_configs(max_pods=4):
+                if not pod_config:
+                    continue
+                move_disruption = sum(pod_config)
+                for np in PRICES:
+                    if np >= price:
+                        continue
+                    savings = price - np
+                    decisions = {}
+                    for n_nodes in range(1, 7):
+                        pool_cost = n_nodes * price
+                        pool_disruption = n_nodes * move_disruption
+                        decisions[n_nodes] = approved(k, savings, move_disruption, pool_cost, pool_disruption)
+                    vals = set(decisions.values())
+                    if len(vals) > 1:
+                        violations.append((k, price, np, pod_config, decisions))
+    return violations
+
+
+# ─── P8: Churn chains ───
+
+def check_p8():
+    """Find the longest replace chain via exhaustive DFS over all approved next-steps.
+
+    At each step, every cheaper price that the scoring formula approves is a valid
+    next node in the chain.  DFS explores all of them and returns the true maximum
+    chain length — not just the greedy "least aggressive step" path, which could
+    miss longer cross-family zigzag paths."""
+    results = defaultdict(list)
+    for k in K_VALUES:
+        for pod_config in pod_configs(max_pods=4):
+            if not pod_config:
+                continue
+            move_disruption = sum(pod_config)
+            for n_nodes in range(1, 5):
+                for start_price in PRICES:
+                    other_cost = (n_nodes - 1) * start_price
+                    other_disruption = (n_nodes - 1) * move_disruption
+                    longest_chain = _dfs_longest_chain(
+                        k, start_price, [start_price],
+                        move_disruption, other_cost, other_disruption,
+                    )
+                    if len(longest_chain) > 1:
+                        results[k].append({
+                            'n_nodes': n_nodes, 'pods': pod_config,
+                            'chain': longest_chain, 'steps': len(longest_chain) - 1,
+                        })
+    # Keep only the longest chains per k
+    for k in results:
+        results[k].sort(key=lambda x: -x['steps'])
+        results[k] = results[k][:5]
+    return results
+
+
+def _dfs_longest_chain(k, current_price, chain, move_disruption, other_cost, other_disruption):
+    """Return the longest chain reachable from current_price via any sequence of approved replaces."""
+    best = list(chain)
+    for np in PRICES:
+        if np >= current_price:
+            continue
+        savings = current_price - np
+        pool_cost = current_price + other_cost
+        pool_disruption = move_disruption + other_disruption
+        if approved(k, savings, move_disruption, pool_cost, pool_disruption):
+            candidate = _dfs_longest_chain(
+                k, np, chain + [np],
+                move_disruption, other_cost, other_disruption,
+            )
+            if len(candidate) > len(best):
+                best = candidate
+    return best
+
+
+# ─── Utility ───
+
+def pod_configs(max_pods=4):
+    """Generate representative pod configurations (tuples of disruption costs)."""
+    configs = [()]  # empty
+    for n in range(1, max_pods + 1):
+        # Don't enumerate all combos — use representative configs
+        for d in DCOSTS:
+            configs.append(tuple([d] * n))  # uniform
+        if n >= 2:
+            configs.append((1, 10))  # mixed
+            configs.append((1, 1, 10))  # mostly low, one high
+            configs.append((10, 10, 1))  # mostly high, one low
+        if n >= 3:
+            configs.append((1, 5, 10))  # spread
+        if n == 4:
+            configs.append((1, 1, 1, 10))
+            configs.append((1, 2, 5, 10))
+            configs.append((10, 10, 10, 1))
+    return configs
+
+
+# ─── Main ───
+
+def main():
+    print("=" * 70)
+    print("Consolidation Scoring Properties — Exhaustive Check")
+    print("=" * 70)
+
+    print("\n── P1: Monotonicity in savings ──")
+    v = check_p1()
+    if v:
+        print(f"  FAIL: {len(v)} violations found!")
+        for x in v[:3]:
+            print(f"    k={x[0]} price={x[1]} approved_at={x[2]} rejected_at={x[3]} nodes={x[4]}")
+    else:
+        print("  PASS: no violations")
+
+    print("\n── P2: Monotonicity in disruption ──")
+    v = check_p2()
+    if v:
+        print(f"  FAIL: {len(v)} violations found!")
+        for x in v[:3]:
+            print(f"    k={x[0]} price={x[1]} d_high={x[2]} d_low={x[3]} nodes={x[4]}")
+    else:
+        print("  PASS: no violations")
+
+    print("\n── P3: Empty nodes always approved for delete ──")
+    v = check_p3()
+    if v:
+        print(f"  FAIL: {len(v)} violations found!")
+        for x in v[:3]:
+            print(f"    k={x[0]} price={x[1]} nodes={x[2]}")
+    else:
+        print("  PASS: no violations")
+
+    print("\n── P4: Zero-savings moves never approved ──")
+    v = check_p4()
+    if v:
+        print(f"  FAIL: {len(v)} violations found!")
+        for x in v[:3]:
+            print(f"    k={x[0]} price={x[1]} pods={x[2]}")
+    else:
+        print("  PASS: no violations")
+
+    print("\n── P5: Single-node pool replaces ──")
+    results = check_p5()
+    for k in K_VALUES:
+        moves = results.get(k, [])
+        pairs = set((m['price'], m['replacement']) for m in moves)
+        print(f"  k={k}: {len(pairs)} unique price pairs, {len(moves)} configurations")
+        for m in moves[:5]:
+            print(f"    {m['price']} → {m['replacement']}  "
+                  f"saves {m['savings_pct']:.0f}%  "
+                  f"disruption={m['disruption']}  "
+                  f"score={m['score']:.2f}")
+
+    print("\n── P6: Skewed disruption cost ──")
+    examples = check_p6()
+    if examples:
+        for ex in examples[:3]:
+            print(f"  k={ex['k']} price={ex['price']}¢ nodes={ex['n_nodes']}")
+            print(f"    high-disruption pods {ex['high_pods']} (d={ex['d_high']}): REJECTED")
+            print(f"    low-disruption pods {ex['low_pods']} (d={ex['d_low']}): APPROVED")
+    else:
+        print("  No examples found")
+
+    print("\n── P7: Fleet size independence (uniform pools) ──")
+    v = check_p7()
+    if v:
+        print(f"  FAIL: {len(v)} violations!")
+        for x in v[:3]:
+            print(f"    k={x[0]} price={x[1]} replacement={x[2]} pods={x[3]}")
+            print(f"    decisions by pool size: {x[4]}")
+    else:
+        print("  PASS: decision is identical across pool sizes 1-6")
+
+    print("\n── P8: Churn chains (with actual post-replace pool updates) ──")
+    results = check_p8()
+    for k in K_VALUES:
+        chains = results.get(k, [])
+        if chains:
+            longest = chains[0]
+            print(f"  k={k}: longest chain = {longest['steps']} steps")
+            for c in chains[:3]:
+                arrow = " → ".join(f"{p}¢" for p in c['chain'])
+                print(f"    nodes={c['n_nodes']} pods={c['pods']} chain: {arrow}")
+        else:
+            print(f"  k={k}: no churn (no replaces approved)")
+
+    print("\n" + "=" * 70)
+    print("Done.")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/designs/scripts/balanced-consolidation-ranking.py b/designs/scripts/balanced-consolidation-ranking.py
new file mode 100644
index 0000000000..d868064717
--- /dev/null
+++ b/designs/scripts/balanced-consolidation-ranking.py
@@ -0,0 +1,475 @@
+#!/usr/bin/env python3
+"""
+Generate ranking-strategies charts for the consolidation cost threshold RFC.
+
+Simulates a realistic cluster by:
+1. Sampling pod workloads from AWS Fargate task sizes (vCPU, memory pairs)
+2. Bin-packing pods onto EC2 instance types (c7i, m7i, r7i families)
+3. Killing a variable fraction of pods per node to simulate scale-down
+4. Generating REPLACE moves (reprovision remaining pods on cheapest instance)
+5. Generating DELETE moves (scatter pods onto existing spare capacity)
+
+Produces two charts showing cumulative savings vs. cumulative disruption
+under four ranking strategies: score, savings-only, disruption-only, random.
+
+Usage:
+    python3 designs/scripts/consolidation-cost-threshold-ranking.py
+
+Output:
+    designs/ranking-strategies-replace.png
+    designs/ranking-strategies-delete.png
+"""
+
+import numpy as np
+import matplotlib.pyplot as plt
+from dataclasses import dataclass, field
+from typing import List
+from pathlib import Path
+
+SEED = 42
+OUTPUT_DIR = Path(__file__).parent.parent
+
+# ---------------------------------------------------------------------------
+# EC2 instance types: c7i, m7i, r7i families (us-east-1, Linux, on-demand)
+# Source: https://instances.vantage.sh
+# ---------------------------------------------------------------------------
+EC2_INSTANCES = [
+    # c7i: compute optimized (2:1 memory:vCPU)
+    ("c7i.large",     2,    4,  0.0893),
+    ("c7i.xlarge",    4,    8,  0.1785),
+    ("c7i.2xlarge",   8,   16,  0.3570),
+    ("c7i.4xlarge",  16,   32,  0.7140),
+    ("c7i.8xlarge",  32,   64,  1.4280),
+    ("c7i.12xlarge", 48,   96,  2.1420),
+    ("c7i.16xlarge", 64,  128,  2.8560),
+    ("c7i.24xlarge", 96,  192,  4.2840),
+    # m7i: general purpose (4:1 memory:vCPU)
+    ("m7i.large",     2,    8,  0.1008),
+    ("m7i.xlarge",    4,   16,  0.2016),
+    ("m7i.2xlarge",   8,   32,  0.4032),
+    ("m7i.4xlarge",  16,   64,  0.8064),
+    ("m7i.8xlarge",  32,  128,  1.6128),
+    ("m7i.12xlarge", 48,  192,  2.4190),
+    ("m7i.16xlarge", 64,  256,  3.2260),
+    ("m7i.24xlarge", 96,  384,  4.8384),
+    ("m7i.48xlarge",192,  768,  9.6768),
+    # r7i: memory optimized (8:1 memory:vCPU)
+    ("r7i.large",     2,   16,  0.1323),
+    ("r7i.xlarge",    4,   32,  0.2646),
+    ("r7i.2xlarge",   8,   64,  0.5292),
+    ("r7i.4xlarge",  16,  128,  1.0584),
+    ("r7i.8xlarge",  32,  256,  2.1168),
+    ("r7i.12xlarge", 48,  384,  3.1750),
+    ("r7i.16xlarge", 64,  512,  4.2336),
+    ("r7i.24xlarge", 96,  768,  6.3500),
+    ("r7i.48xlarge",192, 1536, 12.7008),
+]
+
+# Fixed overhead of consolidating any node (independent of pod count)
+NODE_BASELINE_DISRUPTION = 1.0
+
+
+@dataclass
+class Pod:
+    cpu: float
+    mem: float
+    disruption_cost: float = 1.0
+
+
+@dataclass
+class Node:
+    name: str
+    cpu_cap: float
+    mem_cap: float
+    cost_per_hr: float
+    pods: list = field(default_factory=list)
+
+    @property
+    def cpu_used(self):
+        return sum(p.cpu for p in self.pods)
+
+    @property
+    def mem_used(self):
+        return sum(p.mem for p in self.pods)
+
+    def fits(self, pod):
+        return (pod.cpu <= self.cpu_cap - self.cpu_used and
+                pod.mem <= self.mem_cap - self.mem_used)
+
+    @property
+    def disruption_cost(self):
+        return NODE_BASELINE_DISRUPTION + sum(
+            max(0, p.disruption_cost) for p in self.pods)
+
+
+def find_cheapest_fitting(cpu_needed, mem_needed):
+    """Find the cheapest EC2 instance that fits the given resources."""
+    best = None
+    for name, vcpu, mem, cost in EC2_INSTANCES:
+        if vcpu >= cpu_needed and mem >= mem_needed:
+            if best is None or cost < best[3]:
+                best = (name, vcpu, mem, cost)
+    return best
+
+
+def cheapest_instance_for(pods):
+    """Find the cheapest instance type that fits all pods."""
+    total_cpu = sum(p.cpu for p in pods)
+    total_mem = sum(p.mem for p in pods)
+    return find_cheapest_fitting(total_cpu, total_mem)
+
+
+def build_cluster(rng, n_pods=20000):
+    """
+    Build a cluster by bin-packing pods onto EC2 instances.
+
+    Uses first-fit-decreasing: sort all pods by resource footprint (cpu * mem)
+    descending, then greedily place each pod onto the first existing node that
+    fits. When no node fits, provision a new one -- picking the cheapest
+    instance that fits the pod, but upsizing to a ~4x bigger instance if it
+    costs less than 2x as much. This produces denser packing and a more
+    realistic instance type distribution.
+    """
+    # Generate pods with random resource requests. CPU and memory are drawn
+    # independently from log-normal distributions, producing a realistic mix
+    # of compute-heavy, memory-heavy, and balanced pods. This naturally
+    # spreads pods across different instance families (c7i for compute-heavy,
+    # r7i for memory-heavy, m7i for balanced), creating cost diversity.
+    cpu_raw = rng.lognormal(mean=0.0, sigma=1.0, size=n_pods)
+    mem_raw = rng.lognormal(mean=1.5, sigma=1.2, size=n_pods)
+
+    # Clamp to reasonable ranges and round to Fargate-like granularity
+    all_pods = []
+    for c, m in zip(cpu_raw, mem_raw):
+        cpu = float(np.clip(round(c * 2) / 2, 0.25, 16))    # 0.25 to 16, step 0.5
+        mem = float(np.clip(round(m), 1, 120))                # 1 to 120 GiB, step 1
+        dc = float(rng.choice([0, 1, 1, 1, 1, 1, 10]))
+        all_pods.append(Pod(cpu=cpu, mem=mem, disruption_cost=dc))
+
+    # First-fit-decreasing bin packing: sort all pods by resource footprint
+    # descending, then greedily place each onto the first existing node that
+    # fits. When no node fits, provision the cheapest instance that can hold
+    # the pod. This mirrors Karpenter's provisioning behavior.
+    all_pods.sort(key=lambda p: p.cpu * p.mem, reverse=True)
+
+    nodes = []
+    for pod in all_pods:
+        placed = False
+        for node in nodes:
+            if node.fits(pod):
+                node.pods.append(pod)
+                placed = True
+                break
+        if not placed:
+            chosen = find_cheapest_fitting(pod.cpu, pod.mem)
+            if chosen is None:
+                raise ValueError(
+                    f"No instance fits pod ({pod.cpu}, {pod.mem})")
+            new_node = Node(
+                name=chosen[0],
+                cpu_cap=chosen[1], mem_cap=chosen[2],
+                cost_per_hr=chosen[3],
+            )
+            new_node.pods.append(pod)
+            nodes.append(new_node)
+
+    return nodes
+
+
+def churn_pods(rng, nodes):
+    """Simulate workload churn: kill some pods, add some new ones.
+
+    For each node, kill 0-80% of pods (simulating scale-down or
+    redeployment), then add 0-3 new randomly-sized pods if they fit
+    (simulating new deployments landing on nodes with spare capacity).
+    This changes the resource mix on each node over time, creating
+    mismatches between the node's instance type and its actual workload
+    -- exactly the situation that consolidation is designed to fix.
+    """
+    for node in nodes:
+        # Kill phase: remove a random fraction of pods
+        if len(node.pods) > 1:
+            kill_fraction = rng.uniform(0.0, 0.8)
+            n_kill = int(len(node.pods) * kill_fraction)
+            if n_kill > 0:
+                kill_indices = set(
+                    rng.choice(len(node.pods), size=n_kill, replace=False))
+                node.pods = [p for i, p in enumerate(node.pods)
+                             if i not in kill_indices]
+
+        # Add phase: place 0-3 new pods if they fit
+        n_add = rng.integers(0, 4)
+        for _ in range(n_add):
+            cpu = float(np.clip(round(rng.lognormal(0.0, 1.0) * 2) / 2, 0.25, 16))
+            mem = float(np.clip(round(rng.lognormal(1.5, 1.2)), 1, 120))
+            dc = float(rng.choice([0, 1, 1, 1, 1, 1, 10]))
+            pod = Pod(cpu=cpu, mem=mem, disruption_cost=dc)
+            if node.fits(pod):
+                node.pods.append(pod)
+
+
+def generate_replace_moves(rng, nodes):
+    """
+    For each node, find the cheapest instance that fits its current pods.
+    If cheaper than the current node, that's a REPLACE move.
+    """
+    moves = []
+    nodepool_cost = sum(n.cost_per_hr for n in nodes)
+    nodepool_disruption = sum(n.disruption_cost for n in nodes)
+
+    for node in nodes:
+        if len(node.pods) == 0:
+            continue
+
+        result = cheapest_instance_for(node.pods)
+        if result is None:
+            continue
+
+        _, _, _, new_cost = result
+        savings = node.cost_per_hr - new_cost
+        disruption = node.disruption_cost
+
+        if savings <= 0 or disruption <= 0:
+            continue
+        if nodepool_cost <= 0 or nodepool_disruption <= 0:
+            continue
+
+        savings_frac = savings / nodepool_cost
+        disruption_frac = disruption / nodepool_disruption
+        score = savings_frac / disruption_frac
+
+        moves.append({
+            "savings": savings,
+            "disruption": disruption,
+            "savings_frac": savings_frac,
+            "disruption_frac": disruption_frac,
+            "score": score,
+        })
+
+    return moves
+
+
+def generate_delete_moves(rng, nodes):
+    """
+    For each node, check if its pods could fit on other nodes' spare capacity.
+    If so, generate a DELETE move (full node cost saved, all pods disrupted).
+    """
+    moves = []
+    nodepool_cost = sum(n.cost_per_hr for n in nodes)
+    nodepool_disruption = sum(n.disruption_cost for n in nodes)
+
+    for i, node in enumerate(nodes):
+        if len(node.pods) == 0:
+            continue
+
+        # Check if all pods fit on spare capacity of other nodes
+        sorted_pods = sorted(node.pods, key=lambda p: (p.cpu, p.mem),
+                             reverse=True)
+
+        remaining = {}
+        for j in range(len(nodes)):
+            if j != i:
+                remaining[j] = (
+                    nodes[j].cpu_cap - nodes[j].cpu_used,
+                    nodes[j].mem_cap - nodes[j].mem_used,
+                )
+
+        can_fit = True
+        for pod in sorted_pods:
+            best_j = None
+            best_waste = float("inf")
+            for j, (cpu_f, mem_f) in remaining.items():
+                if cpu_f >= pod.cpu and mem_f >= pod.mem:
+                    waste = (cpu_f - pod.cpu) + (mem_f - pod.mem)
+                    if waste < best_waste:
+                        best_waste = waste
+                        best_j = j
+            if best_j is not None:
+                cpu_f, mem_f = remaining[best_j]
+                remaining[best_j] = (cpu_f - pod.cpu, mem_f - pod.mem)
+            else:
+                can_fit = False
+                break
+
+        if not can_fit:
+            continue
+
+        savings = node.cost_per_hr
+        disruption = node.disruption_cost
+
+        if savings <= 0 or disruption <= 0:
+            continue
+        if nodepool_cost <= 0 or nodepool_disruption <= 0:
+            continue
+
+        savings_frac = savings / nodepool_cost
+        disruption_frac = disruption / nodepool_disruption
+        score = savings_frac / disruption_frac
+
+        moves.append({
+            "savings": savings,
+            "disruption": disruption,
+            "savings_frac": savings_frac,
+            "disruption_frac": disruption_frac,
+            "score": score,
+        })
+
+    return moves
+
+
+def plot_ranking(moves, title, output_path):
+    """Plot cumulative savings vs disruption for four ranking strategies."""
+    if not moves:
+        print(f"  No moves for {title}, skipping")
+        return
+
+    rng = np.random.default_rng(SEED + 1)
+    n = len(moves)
+
+    savings = np.array([m["savings_frac"] for m in moves])
+    disruption = np.array([m["disruption_frac"] for m in moves])
+    scores = np.array([m["score"] for m in moves])
+
+    strategies = {
+        "Score (benefit/cost)": np.argsort(-scores),
+        "Savings only": np.argsort(-savings),
+        "Disruption only (asc)": np.argsort(disruption),
+        "Random": rng.permutation(n),
+    }
+
+    fig, ax = plt.subplots(figsize=(8, 6))
+
+    for label, order in strategies.items():
+        cum_disruption = np.cumsum(disruption[order])
+        cum_savings = np.cumsum(savings[order])
+        if cum_disruption[-1] > 0:
+            cum_disruption = cum_disruption / cum_disruption[-1]
+        if cum_savings[-1] > 0:
+            cum_savings = cum_savings / cum_savings[-1]
+        ax.plot(cum_disruption, cum_savings, label=label, linewidth=2)
+
+    ax.set_xlabel("Cumulative disruption (fraction of total)")
+    ax.set_ylabel("Cumulative savings (fraction of total)")
+    ax.set_title(title)
+    ax.legend(loc="lower right")
+    ax.set_xlim(0, 1)
+    ax.set_ylim(0, 1)
+    ax.set_aspect("equal")
+    ax.grid(True, alpha=0.3)
+
+    fig.tight_layout()
+    fig.savefig(output_path, dpi=150)
+    print(f"  Saved {output_path}")
+
+
+def _plot_panel(ax, moves, title):
+    """Plot one panel of cumulative savings vs disruption."""
+    rng = np.random.default_rng(SEED + 1)
+    n = len(moves)
+
+    savings = np.array([m["savings_frac"] for m in moves])
+    disruption = np.array([m["disruption_frac"] for m in moves])
+    scores = np.array([m["score"] for m in moves])
+
+    strategies = {
+        "Score (benefit/cost)": np.argsort(-scores),
+        "Savings only": np.argsort(-savings),
+        "Disruption only (asc)": np.argsort(disruption),
+        "Random": rng.permutation(n),
+    }
+
+    for label, order in strategies.items():
+        cum_disruption = np.cumsum(disruption[order])
+        cum_savings = np.cumsum(savings[order])
+        if cum_disruption[-1] > 0:
+            cum_disruption = cum_disruption / cum_disruption[-1]
+        if cum_savings[-1] > 0:
+            cum_savings = cum_savings / cum_savings[-1]
+        ax.plot(cum_disruption, cum_savings, label=label, linewidth=2)
+
+    ax.set_xlabel("Cumulative disruption (fraction of total)")
+    ax.set_ylabel("Cumulative savings (fraction of total)")
+    ax.set_title(title)
+    ax.legend(loc="lower right", fontsize=8)
+    ax.set_xlim(0, 1)
+    ax.set_ylim(0, 1)
+    ax.set_aspect("equal")
+    ax.grid(True, alpha=0.3)
+
+
+def plot_ranking_sidebyside(replace_moves, delete_moves, output_path):
+    """Plot REPLACE and DELETE ranking charts side by side."""
+    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 6))
+
+    if replace_moves:
+        _plot_panel(ax1, replace_moves, "REPLACE moves")
+    if delete_moves:
+        _plot_panel(ax2, delete_moves, "DELETE moves")
+
+    fig.tight_layout()
+    fig.savefig(output_path, dpi=150)
+    print(f"  Saved {output_path}")
+
+
+def main():
+    rng = np.random.default_rng(SEED)
+
+    print("Building cluster...")
+    nodes = build_cluster(rng, n_pods=5000)
+    total_pods = sum(len(n.pods) for n in nodes)
+    print(f"  {len(nodes)} nodes, {total_pods} pods, "
+          f"{total_pods/len(nodes):.1f} pods/node")
+    print(f"  NodePool cost: ${sum(n.cost_per_hr for n in nodes):.2f}/hr")
+
+    # Run multiple rounds of churn to accumulate consolidation candidates.
+    # Each round kills some pods and adds new ones, then we collect moves.
+    # This simulates a cluster that has been running for a while with
+    # ongoing deployments and scale-downs.
+    all_replace_moves = []
+    all_delete_moves = []
+    n_rounds = 10
+    print(f"Simulating {n_rounds} rounds of workload churn...")
+    for round_num in range(n_rounds):
+        churn_pods(rng, nodes)
+        nodes = [n for n in nodes if len(n.pods) > 0]
+        replace_moves = generate_replace_moves(rng, nodes)
+        delete_moves = generate_delete_moves(rng, nodes)
+        all_replace_moves.extend(replace_moves)
+        all_delete_moves.extend(delete_moves)
+        if round_num == 0 or round_num == n_rounds - 1:
+            total_pods = sum(len(n.pods) for n in nodes)
+            print(f"  Round {round_num+1}: {len(nodes)} nodes, {total_pods} pods, "
+                  f"{len(replace_moves)} REPLACE, {len(delete_moves)} DELETE")
+
+    replace_moves = all_replace_moves
+    delete_moves = all_delete_moves
+    print(f"  Total: {len(replace_moves)} REPLACE, {len(delete_moves)} DELETE")
+
+    # Combine all moves -- this is what the consolidation controller evaluates
+    all_moves = replace_moves + delete_moves
+    print(f"  Combined: {len(all_moves)} moves")
+
+    print("Plotting...")
+    plot_ranking_sidebyside(
+        replace_moves, delete_moves,
+        OUTPUT_DIR / "ranking-strategies.png",
+    )
+
+    # Write CSVs for inspection
+    for label, moves in [("all", all_moves),
+                         ("replace", replace_moves),
+                         ("delete", delete_moves)]:
+        csv_path = Path(__file__).parent / f"ranking-strategies-{label}.csv"
+        with open(csv_path, "w") as f:
+            f.write("savings_per_hr,disruption,savings_frac,disruption_frac,score\n")
+            for m in sorted(moves, key=lambda m: -m["score"]):
+                f.write(f"{m['savings']:.4f},{m['disruption']:.1f},"
+                        f"{m['savings_frac']:.6f},{m['disruption_frac']:.6f},"
+                        f"{m['score']:.4f}\n")
+        print(f"  Saved {csv_path}")
+
+    print("Done.")
+
+
+if __name__ == "__main__":
+    main()