File tree 1 file changed +4
-4
lines changed
1 file changed +4
-4
lines changed Original file line number Diff line number Diff line change @@ -267,19 +267,19 @@ Training with ``MPI`` is configured by specifying following fields in ``distribu
267
267
command executed by SageMaker to launch distributed horovod training.
268
268
269
269
270
- In the below example we create an estimator to launch Horovod distributed training with 2 processes on one host:
270
+ In the below example we create an estimator to launch Horovod distributed training with 4 processes on one host:
271
271
272
272
.. code :: python
273
273
274
274
from sagemaker.tensorflow import TensorFlow
275
275
276
276
tf_estimator = TensorFlow(entry_point = ' tf-train.py' , role = ' SageMakerRole' ,
277
- train_instance_count = 1 , train_instance_type = ' ml.p2.xlarge ' ,
278
- framework_version = ' 1.12 ' , py_version = ' py3' ,
277
+ train_instance_count = 1 , train_instance_type = ' ml.p3.8xlarge ' ,
278
+ framework_version = ' 2.1.0 ' , py_version = ' py3' ,
279
279
distributions = {
280
280
' mpi' : {
281
281
' enabled' : True ,
282
- ' processes_per_host' : 2 ,
282
+ ' processes_per_host' : 4 ,
283
283
' custom_mpi_options' : ' --NCCL_DEBUG INFO'
284
284
}
285
285
})
You can’t perform that action at this time.
0 commit comments