Skip to content

Commit 9011ad7

Browse files
authored
fix(runtimes): Set numProcPerNode: 1 in DeepSpeed Runtime (#2774)
Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
1 parent c38fe6d commit 9011ad7

1 file changed

Lines changed: 1 addition & 3 deletions

File tree

manifests/base/runtimes/deepspeed_distributed.yaml

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,9 +8,7 @@ spec:
88
mlPolicy:
99
numNodes: 1
1010
mpi:
11-
# TODO (andreyvelich): Change num proc to 1 and remove container resources after we
12-
# allow to override it via TrainJob APIs.
13-
numProcPerNode: 4
11+
numProcPerNode: 1
1412
mpiImplementation: OpenMPI
1513
sshAuthMountPath: /home/mpiuser/.ssh
1614
runLauncherAsNode: true

0 commit comments

Comments
 (0)