Skip to content

Reclaim Enhancement: Enqueue action may block the process of reclaim action #569

@sivanzcw

Description

@sivanzcw

/kind feature

If a queue has occupied most of cluster resources, when there are pods need to be scheduled in new queue, reclaim action may be blocked due to job of the new pod can not be refreshed to inqueue status by enqueue action

  • cluster resources
serial node name resource
1 node1 4c8g
2 node2 4c8g
  • queue status
serial queue name weight quota status
1 queue2 1 4c 8g overused
2 queue3 1 4c 8g active
  • create joba with 7 pods, each pods have 1c1.5g resource requirement, minA of joba is 1, joba was placed in queue2
apiVersion: batch.volcano.sh/v1alpha1
kind: Job
metadata:
  name: mxnet-job-queue1
spec:
  minAvailable: 1
  schedulerName: volcano
  priorityClassName: zjh-higher
  queue: queue2
  policies:
  - event: PodEvicted
    action: RestartJob
  - event: PodFailed
    action: RestartJob
  plugins:
    svc: []
  tasks:
  - replicas: 1
    name: worker
    template:
      spec:
        imagePullSecrets:
        - name: default-secret
        containers:
        - image: volcanosh/mxnet-train-mnist-cpu:v1
          args:
          - --kv-store=dist_sync
          imagePullPolicy: IfNotPresent
          name: mxnet
          resources:
            limits:
              cpu: "1"
              memory: "1.5Gi"
            requests:
              cpu: "1"
              memory: "1.5Gi"
          env:
          - name: DMLC_PS_ROOT_PORT
            value: "9000"
          - name: DMLC_PS_ROOT_URI
            value: mxnet-job-scheduler-0.mxnet-job
          - name: DMLC_NUM_SERVER
            value: "2"
          - name: DMLC_NUM_WORKER
            value: "2"
          - name: DMLC_ROLE
            value: "worker"
          - name: DMLC_USE_KUBERNETES
            value: "1"
        restartPolicy: OnFailure
  - replicas: 2
    name: server
    template:
      spec:
        imagePullSecrets:
        - name: default-secret
        containers:
        - image: volcanosh/mxnet-train-mnist-cpu:v1
          imagePullPolicy: IfNotPresent
          name: mxnet
          resources:
            limits:
              cpu: "1"
              memory: "1.5Gi"
            requests:
              cpu: "1"
              memory: "1.5Gi"
          env:
          - name: DMLC_PS_ROOT_PORT
            value: "9000"
          - name: DMLC_PS_ROOT_URI
            value: mxnet-job-scheduler-0.mxnet-job
          - name: DMLC_NUM_SERVER
            value: "2"
          - name: DMLC_NUM_WORKER
            value: "2"
          - name: DMLC_ROLE
            value: "server"
          - name: DMLC_USE_KUBERNETES
            value: "1"
        restartPolicy: OnFailure
  - replicas: 4
    name: scheduler
    template:
      spec:
        imagePullSecrets:
        - name: default-secret
        containers:
        - image: volcanosh/mxnet-train-mnist-cpu:v1
          imagePullPolicy: IfNotPresent
          name: mxnet
          resources:
            limits:
              cpu: "1"
              memory: "1.5Gi"
            requests:
              cpu: "1"
              memory: "1.5Gi"
          env:
          - name: DMLC_PS_ROOT_PORT
            value: "9000"
          - name: DMLC_PS_ROOT_URI
            value: mxnet-job-scheduler-0.mxnet-job
          - name: DMLC_NUM_SERVER
            value: "2"
          - name: DMLC_NUM_WORKER
            value: "2"
          - name: DMLC_ROLE
            value: "scheduler"
          - name: DMLC_USE_KUBERNETES
            value: "1"
        restartPolicy: OnFailure

  • all pods of joba will be Running

  • create jobb with one pod, the request resources of pod is 2c2g, jobb is placed in queue3

apiVersion: batch.volcano.sh/v1alpha1
kind: Job
metadata:
  name: mxnet-job-default
spec:
  minAvailable: 1
  schedulerName: volcano
  priorityClassName: zjh-higher
  queue: queue3
  plugins:
    svc: []
  tasks:
  - replicas: 1
    name: worker
    template:
      spec:
        imagePullSecrets:
        - name: default-secret
        containers:
        - image: volcanosh/mxnet-train-mnist-cpu:v1
          args:
          - --kv-store=dist_sync
          imagePullPolicy: IfNotPresent
          name: mxnet
          resources:
            limits:
              cpu: "2"
              memory: "2Gi"
            requests:
              cpu: "2"
              memory: "2Gi"
          env:
          - name: DMLC_PS_ROOT_PORT
            value: "9000"
          - name: DMLC_PS_ROOT_URI
            value: mxnet-job-scheduler-0.mxnet-job
          - name: DMLC_NUM_SERVER
            value: "2"
          - name: DMLC_NUM_WORKER
            value: "2"
          - name: DMLC_ROLE
            value: "worker"
          - name: DMLC_USE_KUBERNETES
            value: "1"
        restartPolicy: OnFailure
  • expected that pod of jobb evicted two pods of joba, actually, eviction was not happen, jobb was pending. Since, idle cluster resources was not afford to satisfy the resource requirement of jobb, podgroup of jobb will be remain pending. Job with pending phase can not evict pods of other job.

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.priority/important-soonMust be staffed and worked on either currently, or very soon, ideally in time for the next release.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions