when pod CrashLoopBackOff,the job status should be failed

**Is this a BUG REPORT or FEATURE REQUEST?**:

> Uncomment only one, leave it on its own line:
>
/kind bug
> /kind feature


**What happened**:
i test create tf-sample ,when i got this status
```
tensorflow-benchmark-ps-0       0/1     CrashLoopBackOff   5          3m46s
tensorflow-benchmark-worker-0   1/1     Running            0          3m46s
tensorflow-benchmark-worker-1   1/1     Running            0          3m46s

```
but the job status is running



```
[root@node1` tf-sample]# kubectl describe jobs.batch.volcano.sh tensorflow-benchmark
Name:         tensorflow-benchmark
Namespace:    default
Labels:       volcano.sh/job-type=Tensorflow
Annotations:  <none>
API Version:  batch.volcano.sh/v1alpha1
Kind:         Job
Metadata:
  Creation Timestamp:  2019-09-05T03:24:41Z
  Generation:          1
  Resource Version:    17317553
  Self Link:           /apis/batch.volcano.sh/v1alpha1/namespaces/default/jobs/tensorflow-benchmark
  UID:                 b3189fdd-cf8c-11e9-84e3-6c92bf8b7a92
Spec:
  Min Available:  3
  Plugins:
    Env:
    Svc:
  Policies:
    Action:        RestartJob
    Event:         PodEvicted
  Queue:           default
  Scheduler Name:  volcano
  Tasks:
    Name:      ps
    Replicas:  1
    Template:
      Spec:
        Containers:
          Command:
            sh
            -c
            PS_HOST=`cat /etc/volcano/ps.host | sed 's/$/&:2222/g' | tr "\n" ","`;
WORKER_HOST=`cat /etc/volcano/worker.host | sed 's/$/&:2222/g' | tr "\n" ","`;
python tf_cnn_benchmarks1.py --batch_size=32 --model=resnet50 --variable_update=parameter_server --flush_stdout=true --num_gpus=1 --local_parameter_device=cpu --device=cpu --data_format=NHWC --job_name=ps --task_index=${VK_TASK_INDEX} --ps_hosts=${PS_HOST} --worker_hosts=${WORKER_HOST}

          Image:  volcanosh/example-tf:0.0.1
          Name:   tensorflow
          Ports:
            Container Port:  2222
            Name:            tfjob-port
          Resources:
            Limits:
              Cpu:     1000m
              Memory:  2048Mi
            Requests:
              Cpu:      1000m
              Memory:   2048Mi
          Working Dir:  /opt/tf-benchmarks/scripts/tf_cnn_benchmarks
        Image Pull Secrets:
          Name:          default-secret
        Restart Policy:  OnFailure
    Name:                worker
    Policies:
      Action:  CompleteJob
      Event:   TaskCompleted
    Replicas:  2
    Template:
      Spec:
        Containers:
          Command:
            sh
            -c
            PS_HOST=`cat /etc/volcano/ps.host | sed 's/$/&:2222/g' | tr "\n" ","`;
WORKER_HOST=`cat /etc/volcano/worker.host | sed 's/$/&:2222/g' | tr "\n" ","`;
python tf_cnn_benchmarks.py --batch_size=32 --model=resnet50 --variable_update=parameter_server --flush_stdout=true --num_gpus=1 --local_parameter_device=cpu --device=cpu --data_format=NHWC --job_name=worker --task_index=${VK_TASK_INDEX} --ps_hosts=${PS_HOST} --worker_hosts=${WORKER_HOST}

          Image:  volcanosh/example-tf:0.0.1
          Name:   tensorflow
          Ports:
            Container Port:  2222
            Name:            tfjob-port
          Resources:
            Limits:
              Cpu:     2000m
              Memory:  4096Mi
            Requests:
              Cpu:      2000m
              Memory:   2048Mi
          Working Dir:  /opt/tf-benchmarks/scripts/tf_cnn_benchmarks
        Image Pull Secrets:
          Name:          default-secret
        Restart Policy:  OnFailure
Status:
  Controlled Resources:
    Plugin - Env:  env
    Plugin - Svc:  svc
  Min Available:   3
  Running:         3
  State:
    Last Transition Time:  2019-09-05T03:24:44Z
    Phase:                 Running
Events:                    <none>

```
**What you expected to happen**:

when one task failed, the job should in failed status
**How to reproduce it (as minimally and precisely as possible)**:


**Anything else we need to know?**:

**Environment**:
- Volcano Version:
- Kubernetes version (use `kubectl version`):
- Cloud provider or hardware configuration:
- OS (e.g. from /etc/os-release):
- Kernel (e.g. `uname -a`):
- Install tools:
- Others:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

when pod CrashLoopBackOff,the job status should be failed #436

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

when pod CrashLoopBackOff,the job status should be failed #436

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions