Skip to content

Support Workload API for TrainJob Scheduling #3015

@andreyvelich

Description

@andreyvelich

What you would like to be added?

Kubernetes recently introduced support for Workload API: kubernetes/enhancements#4671
In v1.35 this API allows to schedule group of Pods together (e.g. Gang-Scheduling).

We should create dedicated PodGroupPolicy plugin to support creation of Workload object.

API Changes

We can design an initial API as follows:

apiVersion: trainer.kubeflow.org/v1alpha1
kind: ClusterTrainingRuntime
metadata:
  name: mpi-runtime
spec:
  podGroupPolicy:
    workload: {}

If this plugin is enabled, it should create Workload API:

apiVersion: scheduling.k8s.io/v1alpha2
kind: Workload
metadata:
  name: <job-name>
  ownerReferences:
  - apiVersion: trainer.kubeflow.org/v1alpha1
    kind: TrainJob
    name: <job-name>
spec:
  controllerRef:
    apiVersion: trainer.kubeflow.org/v1alpha1
    kind: TrainJob
    name: <job-name>
  podGroupTemplate:
  - name: trainer
    schedulingPolicy:
      gang:
        minCount: 8  # Equal to trainJob.spec.trainer.numNodes
---
apiVersion: scheduling.k8s.io/v1alpha1
kind: PodGroup
metadata:
  name: <job-name>-<podGroup-template-name>-<hash>
  ownerReferences:
  - apiVersion: trainer.kubeflow.org/v1alpha1
    kind: TrainJob
    name: <job-name>
  - apiVersion: scheduling.k8s.io/v1alpha1
    kind: Workload
    name: <job-name>
spec:
  podGroupTemplateRef:
    workloadName: <job-name>
    podGroupTemplateName: trainer
  schedulingPolicy:
    gang:
      minCount: 8  # Equal to trainJob.spec.trainer.numNodes

And update the Pod's spec schedulingGroup:

spec:
  schedulingGroup:
    podGroupName: <workload-name>-<podGroup-template-name>-<hash>

JobSet Integrations

@imreddy13 has an open KEP to support Workload creation in the JobSet controller as well: kubernetes-sigs/jobset#969

We should discuss whether we want to re-use this API once it is available, or TrainJob controller should always be responsible to create Workload object.
Since we have plans to support non-JobSet based jobs, like Flux MiniCluster: #2909 (cc @vsoch)

cc @kannon92 @kubeflow/kubeflow-trainer-team @macsko @wojtek-t @erictune

/area api
/area controller

Why is this needed?

Enable gang-scheduling in TrainJob using the Workload API.

Love this feature?

Give it a 👍 We prioritize the features with most 👍

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions