You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
KEP: Add support for mutable pod resources in suspended jobs
Introduce a new KEP proposal to allow updating container resource
specifications (CPU, memory, GPU, extended resources) for suspended jobs.
Key features:
- Enable dynamic resource allocation for suspended jobs only
- Support CPU, memory, and GPU resource mutations
- Include extended resources (nvidia.com/gpu, amd.com/gpu, tpu-v4, etc.)
- Allow queue controllers to optimize resource allocation based on
cluster conditions
- Feature gate: MutableJobPodResourcesForSuspendedJobs
- Focus on batch workload optimization scenarios
This proposal enables better cluster utilization and cost optimization
by allowing queue controllers to adjust job resource requirements before
execution based on real-time cluster capacity and resource availability.
Particularly valuable for expensive GPU and specialized hardware resources.
0 commit comments