Skip to content

Commit 8604f16

Browse files
andreyvelichalexxfan
authored andcommitted
fix(runtimes): Set numProcPerNode: 1 in DeepSpeed Runtime (kubeflow#2774)
Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
1 parent ac01cd0 commit 8604f16

1 file changed

Lines changed: 1 addition & 3 deletions

File tree

manifests/base/runtimes/deepspeed_distributed.yaml

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,9 +8,7 @@ spec:
88
mlPolicy:
99
numNodes: 1
1010
mpi:
11-
# TODO (andreyvelich): Change num proc to 1 and remove container resources after we
12-
# allow to override it via TrainJob APIs.
13-
numProcPerNode: 4
11+
numProcPerNode: 1
1412
mpiImplementation: OpenMPI
1513
sshAuthMountPath: /home/mpiuser/.ssh
1614
runLauncherAsNode: true

0 commit comments

Comments
 (0)