Workload Scheduling Constraints (Workload-Level) and Preference-Aware MultiKueue Dispatching

**What would you like to be added**:

Kueue today supports preference-based placement via flavor fungibility, but these preferences are **soft-only** and cannot be expressed as **hard scheduling constraints**. In addition, MultiKueue dispatching strategies (`AllAtOnce`, `Incremental`) are **race-based**, meaning the first worker cluster that admits a workload wins, regardless of whether that placement is optimal.

This issue proposes extending Kueue with **workload-level scheduling constraints** and updating MultiKueue dispatching to be **preference-aware** rather than timing-driven.


**Scheduling constraints must be specified per workload, at the workload level.**

They should **not** be global defaults and should **not** be ClusterQueue-wide policies.
The intent is to allow different workloads sharing the same ClusterQueue to express different scheduling requirements.

This mirrors Kubernetes design patterns, where constraints are typically attached to the object being scheduled (e.g., Pods), not to the scheduler or queue globally.

**Why is this needed**:

## Problem

### Single-cluster limitations

Currently, users cannot express strict workload-specific guarantees such as:

* “This workload must not preempt other workloads.”
* “This workload must not borrow quota from a cohort.”
* “If these conditions cannot be met, keep this workload pending.”

If borrowing or preemption is enabled at the ClusterQueue level, Kueue may eventually use them for *all* workloads, even when a specific workload would prefer to wait.

This makes it impossible to express **per-workload hard guarantees**, only queue-wide soft ordering.

This limits Kueue’s usefulness for:

* SLA-sensitive workloads
* Fairness- or isolation-critical workloads
* Budget- or quota-bound workloads
* Mixed workloads sharing the same ClusterQueue

### MultiKueue limitations

MultiKueue dispatching modes (`AllAtOnce`, `Incremental`) are fundamentally race-based:

* Workloads are dispatched to multiple worker clusters.
* The first cluster to admit the workload wins.
* No comparison of placement quality is performed.

This can result in:

* A cluster that admits a workload using borrowing winning over a cluster that could admit the same workload without borrowing
* A cluster that admits a workload using preemption winning over a cluster that could admit it without requiring preemption
* Non-deterministic placement driven by control-plane timing rather than placement quality
* Unnecessary workload preemption, even though the workload ultimately runs on a different cluster, because MultiKueue nominated another cluster as the winner

These semantics break the flavor fungibility mental model across clusters.

### Example

Assume a workload with **no borrowing, no preemption** constraints is dispatched to three clusters:

| Cluster | Admission Result                     |
| ------- | ------------------------------------ |
| A       | Fits without borrowing or preemption |
| B       | Fits with borrowing                  |
| C       | Fits with preemption                 |

Today, B or C may win simply because they respond faster. 

Moreover, workload preemption will be triggered on cluster C **_irrespective_** of the final workload placement, even if the workload ultimately runs on a different cluster.

Desired behavior:

* The workload should only be admitted on **A**
* If A is unavailable, the workload should remain pending

## Proposed Direction

### 1. Add workload-level constraint-aware scheduling to Kueue

Extend the Workload API to support **hard placement constraints**, evaluated per workload.

Illustrative API examples:

```yaml
spec:
  admissionConstraints:
    requireNoBorrowing: true
    requireNoPreemption: true
```

Or a more expressive form:

```yaml
spec:
  placementPolicy:
    borrowing: Forbidden | Allowed 
    preemption: Forbidden | Allowed
```

Key properties:

* Constraints are evaluated **per workload**
* Constraints override queue-wide capabilities
* If constraints are not satisfied, the workload remains pending

### 2. Surface reasoned admission rejections

Instead of only reporting admitted / not admitted, Kueue should surface structured rejection reasons: "Unsatisfied Admission Constraint due to:"
* Requires borrowing
* Requires preemption

This enables higher-level scheduling logic and MultiKueue dispatching to reason about failures.

### 3. Make MultiKueue dispatching preference-aware

Once workload-level constraints exist, MultiKueue dispatching can move away from races.

Instead of “first admission wins”, MultiKueue should:

```
For preference tier P1:
  Try all clusters
If none accept:
  Move to P2
Repeat
```

Preference tiers are derived from **workload-level constraints**, not queue defaults.

This preserves flavor fungibility semantics across clusters while respecting per-workload guarantees.

## Benefits

* Enables **per-workload** hard scheduling guarantees
* Allows heterogeneous workloads to safely share ClusterQueues
* Preserves flavor fungibility semantics across clusters
* Eliminates race-based placement
* Improves determinism and placement quality
* Avoids unnecessary borrowing and preemption
* Keeps a clean separation between single-cluster admission and multi-cluster dispatching

## Conclusion

This issue proposes introducing a missing scheduling primitive in Kueue: **workload-level hard scheduling constraints**, and extending MultiKueue to be **preference-aware** instead of race-based.

The guiding principle is:

> The scheduler should select the best feasible placement for a given workload, not the fastest one.

**Completion requirements**:

This enhancement requires the following artifacts:

- [x] Design doc
- [x] API change
- [x] Docs update

The artifacts should be linked in subsequent comments.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Workload Scheduling Constraints (Workload-Level) and Preference-Aware MultiKueue Dispatching #8729

Problem

Single-cluster limitations

MultiKueue limitations

Example

Proposed Direction

1. Add workload-level constraint-aware scheduling to Kueue

2. Surface reasoned admission rejections

3. Make MultiKueue dispatching preference-aware

Benefits

Conclusion

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Cluster	Admission Result
A	Fits without borrowing or preemption
B	Fits with borrowing
C	Fits with preemption

Workload Scheduling Constraints (Workload-Level) and Preference-Aware MultiKueue Dispatching #8729

Description

Problem

Single-cluster limitations

MultiKueue limitations

Example

Proposed Direction

1. Add workload-level constraint-aware scheduling to Kueue

2. Surface reasoned admission rejections

3. Make MultiKueue dispatching preference-aware

Benefits

Conclusion

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions