-
Notifications
You must be signed in to change notification settings - Fork 542
Description
What would you like to be added:
A mechanism to consider preemption cost when looking for and sorting candidates for preemption. The field would be set by some external controller (with a referential implementation of such controller included as part of the feature). Kueue should include a signal from that controller and should try to minimize the preemption cost.
Something that we need some feedback for would be whether the preemption cost parameter should be used only to sort/compare workloads with the same priority (e.g. 3 workloads A, B, C have workload priority medium, but A: has high preemption cost, B has medium preemption cost and C has low preemption cost, in this case the order of considering them for preemption should be: C, B, A) or whether it should be a combination of workload priority and the preemption cost (e.g. when having a low priority workload D with a critical preemption cost and a medium priority workload E having low preemption cost the order of preemptions should be: E, D).
Why is this needed:
Cost of preemption is not equal across pods. Pods which have a relatively long initialization time (minutes) are cheap to preempt if the initialization didn't finish but expensive to preempt when they already started carrying out some work. The same applies to checkpointing. If a pod carried out some work already but hasn't created a checkpoint yet it's costly to preempt it and its effort would be wasted, meanwhile a pod that just finished checkpointing is relatively cheap to preempt.
Since access to accelerators is so expensive and the accelerators are so scarce I think Kueue should try to care about the preemption cost and should use additional signals when looking for and sorting candidates for preemptions.
Also I don't think that Kueue itself should include some well defined policies how to calculate the cost of preemption but rather should allow external controllers to populate this information and Kueue should consider it during identifying and sorting the preemption candidates. We could offer a reference implementation of such controller that'd be a good example for folks trying to implement their own accordingly to the policies at their organizations and their business goals. I think a good referential example would be a controller that increases the preemption cost for every preemption that already happened. We store that number in the schedulingStats (thanks @PBundyra !) so it should be easy to use and react to this field to calculate the referential preemption cost field.
Completion requirements:
This enhancement requires the following artifacts:
- Design doc
- API change
- Docs update
The artifacts should be linked in subsequent comments.