Reclaim Enhancement: Enqueue action may block the process of `reclaim` action

 /kind feature

If a queue has occupied most of cluster resources, when there are pods need to be scheduled in new queue, `reclaim` action may be blocked due to job of the new pod can not be refreshed to `inqueue` status by `enqueue` action

* cluster resources

| serial   | node name | resource  |
|----------|------------|---------|
| 1        | node1    | 4c8g       |
| 2        | node2    | 4c8g  |

* queue status

| serial   | queue name | weight  | quota| status|
|----------|------------|---------|---|---- |
| 1        | queue2 | 1       | 4c 8g | overused |
| 2        | queue3     | 1       | 4c 8g| active |

* create joba with 7 pods, each pods have 1c1.5g resource requirement, minA of joba is 1, joba was placed in `queue2`

```
apiVersion: batch.volcano.sh/v1alpha1
kind: Job
metadata:
  name: mxnet-job-queue1
spec:
  minAvailable: 1
  schedulerName: volcano
  priorityClassName: zjh-higher
  queue: queue2
  policies:
  - event: PodEvicted
    action: RestartJob
  - event: PodFailed
    action: RestartJob
  plugins:
    svc: []
  tasks:
  - replicas: 1
    name: worker
    template:
      spec:
        imagePullSecrets:
        - name: default-secret
        containers:
        - image: volcanosh/mxnet-train-mnist-cpu:v1
          args:
          - --kv-store=dist_sync
          imagePullPolicy: IfNotPresent
          name: mxnet
          resources:
            limits:
              cpu: "1"
              memory: "1.5Gi"
            requests:
              cpu: "1"
              memory: "1.5Gi"
          env:
          - name: DMLC_PS_ROOT_PORT
            value: "9000"
          - name: DMLC_PS_ROOT_URI
            value: mxnet-job-scheduler-0.mxnet-job
          - name: DMLC_NUM_SERVER
            value: "2"
          - name: DMLC_NUM_WORKER
            value: "2"
          - name: DMLC_ROLE
            value: "worker"
          - name: DMLC_USE_KUBERNETES
            value: "1"
        restartPolicy: OnFailure
  - replicas: 2
    name: server
    template:
      spec:
        imagePullSecrets:
        - name: default-secret
        containers:
        - image: volcanosh/mxnet-train-mnist-cpu:v1
          imagePullPolicy: IfNotPresent
          name: mxnet
          resources:
            limits:
              cpu: "1"
              memory: "1.5Gi"
            requests:
              cpu: "1"
              memory: "1.5Gi"
          env:
          - name: DMLC_PS_ROOT_PORT
            value: "9000"
          - name: DMLC_PS_ROOT_URI
            value: mxnet-job-scheduler-0.mxnet-job
          - name: DMLC_NUM_SERVER
            value: "2"
          - name: DMLC_NUM_WORKER
            value: "2"
          - name: DMLC_ROLE
            value: "server"
          - name: DMLC_USE_KUBERNETES
            value: "1"
        restartPolicy: OnFailure
  - replicas: 4
    name: scheduler
    template:
      spec:
        imagePullSecrets:
        - name: default-secret
        containers:
        - image: volcanosh/mxnet-train-mnist-cpu:v1
          imagePullPolicy: IfNotPresent
          name: mxnet
          resources:
            limits:
              cpu: "1"
              memory: "1.5Gi"
            requests:
              cpu: "1"
              memory: "1.5Gi"
          env:
          - name: DMLC_PS_ROOT_PORT
            value: "9000"
          - name: DMLC_PS_ROOT_URI
            value: mxnet-job-scheduler-0.mxnet-job
          - name: DMLC_NUM_SERVER
            value: "2"
          - name: DMLC_NUM_WORKER
            value: "2"
          - name: DMLC_ROLE
            value: "scheduler"
          - name: DMLC_USE_KUBERNETES
            value: "1"
        restartPolicy: OnFailure

```
* all pods of joba will be `Running`

* create jobb with one pod, the request resources of pod is 2c2g, jobb is placed in `queue3`

```
apiVersion: batch.volcano.sh/v1alpha1
kind: Job
metadata:
  name: mxnet-job-default
spec:
  minAvailable: 1
  schedulerName: volcano
  priorityClassName: zjh-higher
  queue: queue3
  plugins:
    svc: []
  tasks:
  - replicas: 1
    name: worker
    template:
      spec:
        imagePullSecrets:
        - name: default-secret
        containers:
        - image: volcanosh/mxnet-train-mnist-cpu:v1
          args:
          - --kv-store=dist_sync
          imagePullPolicy: IfNotPresent
          name: mxnet
          resources:
            limits:
              cpu: "2"
              memory: "2Gi"
            requests:
              cpu: "2"
              memory: "2Gi"
          env:
          - name: DMLC_PS_ROOT_PORT
            value: "9000"
          - name: DMLC_PS_ROOT_URI
            value: mxnet-job-scheduler-0.mxnet-job
          - name: DMLC_NUM_SERVER
            value: "2"
          - name: DMLC_NUM_WORKER
            value: "2"
          - name: DMLC_ROLE
            value: "worker"
          - name: DMLC_USE_KUBERNETES
            value: "1"
        restartPolicy: OnFailure
```

* expected that pod of jobb evicted two pods of joba, actually, eviction was not happen, jobb was `pending`. Since, idle cluster resources was not afford to satisfy the resource requirement of jobb, podgroup of jobb will be remain `pending`. Job with `pending` phase can not evict pods of other job.


 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reclaim Enhancement: Enqueue action may block the process of `reclaim` action #569

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Reclaim Enhancement: Enqueue action may block the process of reclaim action #569

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Reclaim Enhancement: Enqueue action may block the process of `reclaim` action #569