Skip to content

documentation: add TF 2.4.1 support to sm distributed data parallel docs and other updates #2179

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Feb 27, 2021
Merged
Original file line number Diff line number Diff line change
Expand Up @@ -155,7 +155,7 @@ PyTorch API

**Supported versions:**

- PyTorch 1.6
- PyTorch 1.6.0


.. function:: smdistributed.dataparallel.torch.distributed.is_available()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -414,7 +414,7 @@ TensorFlow API

.. function:: smdistributed.dataparallel.tensorflow.DistributedOptimizer

Applicable if you use the ``tf.estimator`` API in TensorFlow 2.x (2.3).
Applicable if you use the ``tf.estimator`` API in TensorFlow 2.x (2.3.1).
Construct a new ``DistributedOptimizer`` , which uses TensorFlow
optimizer under the hood for computing single-process gradient values
Expand Down Expand Up @@ -489,7 +489,7 @@ TensorFlow API

.. function:: smdistributed.dataparallel.tensorflow.BroadcastGlobalVariablesHook

Applicable if you use the ``tf.estimator`` API in TensorFlow 2.x (2.3).
Applicable if you use the ``tf.estimator`` API in TensorFlow 2.x (2.3.1).


``SessionRunHook`` that will broadcast all global variables from root
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,14 +8,18 @@

### PyTorch

#### Add support for PyTorch 1.7
#### Add support for PyTorch 1.7.1

- Adds support for `gradient_as_bucket_view` (PyTorch 1.7 only), `find_unused_parameters` (PyTorch 1.7 only) and `broadcast_buffers` options to `smp.DistributedModel`. These options behave the same as the corresponding options (with the same names) in
- Adds support for `gradient_as_bucket_view` (PyTorch 1.7.1 only), `find_unused_parameters` (PyTorch 1.7.1 only) and `broadcast_buffers` options to `smp.DistributedModel`. These options behave the same as the corresponding options (with the same names) in
`torch.DistributedDataParallel` API. Please refer to the [SageMaker distributed model parallel API documentation](https://sagemaker.readthedocs.io/en/stable/api/training/smd_model_parallel_pytorch.html#smp.DistributedModel) for more information.

- Adds support for `join` (PyTorch 1.7 only) context manager, which is to be used in conjunction with an instance of `smp.DistributedModel` to be able to train with uneven inputs across participating processes.
- Adds support for `join` (PyTorch 1.7.1 only) context manager, which is to be used in conjunction with an instance of `smp.DistributedModel` to be able to train with uneven inputs across participating processes.

- Adds support for `_register_comm_hook` (PyTorch 1.7 only) which will register the callable as a communication hook for DDP. NOTE: Like in DDP, this is an experimental API and subject to change.
- Adds support for `_register_comm_hook` (PyTorch 1.7.1 only) which will register the callable as a communication hook for DDP. NOTE: Like in DDP, this is an experimental API and subject to change.

### Tensorflow

- Adds support for Tensorflow 2.4.1

## Bug Fixes

Expand All @@ -32,7 +36,7 @@ regular dicts.

### PyTorch

- A performance regression was observed when training on SMP with PyTorch 1.7.1 compared to 1.6. The rootcause was found to be the slowdown in performance of `.grad` method calls in PyTorch 1.7.1 compared to 1.6. Please see the related discussion: https://github.com/pytorch/pytorch/issues/50636.
- A performance regression was observed when training on SMP with PyTorch 1.7.1 compared to 1.6.0. The rootcause was found to be the slowdown in performance of `.grad` method calls in PyTorch 1.7.1 compared to 1.6.0. Please see the related discussion: https://github.com/pytorch/pytorch/issues/50636.


# Sagemaker Distributed Model Parallel 1.1.0 Release Notes
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
TensorFlow API
==============

**Supported version: 2.3**
**Supported version: 2.3.1**

**Important**: This API document assumes you use the following import statement in your training scripts.

Expand Down Expand Up @@ -81,7 +81,7 @@ TensorFlow API
[...]
x = tf.constant(1.2)                     # placed in partition 0
with smp.partition(1):
    y = tf.add(x, tf.constant(2.3))      # placed in partition 1
    y = tf.add(x, tf.constant(2.3.1))      # placed in partition 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to change it here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this update.

    with smp.partition(3):
        z = tf.reduce_sum(y)             # placed in partition 3

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
PyTorch API
===========

**Supported versions: 1.7.1, 1.6**
**Supported versions: 1.7.1, 1.6.0**

This API document assumes you use the following import statements in your training scripts.

Expand Down Expand Up @@ -159,7 +159,7 @@ This API document assumes you use the following import statements in your traini
This parameter is forwarded to the underlying ``DistributedDataParallel`` wrapper.
Please see: `broadcast_buffer <https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html#torch.nn.parallel.DistributedDataParallel>`__.

- ``gradient_as_bucket_view (PyTorch 1.7 only)`` (default: False): To be
- ``gradient_as_bucket_view (PyTorch 1.7.1 only)`` (default: False): To be
used with ``ddp=True``. This parameter is forwarded to the underlying
``DistributedDataParallel`` wrapper. Please see `gradient_as_bucket_view <https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html#torch.nn.parallel.DistributedDataParallel>`__.

Expand Down Expand Up @@ -257,7 +257,7 @@ This API document assumes you use the following import statements in your traini

.. function:: join( )

**Available for PyTorch 1.7 only**
**Available for PyTorch 1.7.1 only**

A context manager to be used in conjunction with an instance of
``smp.DistributedModel`` to be able to train with uneven inputs across
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
TensorFlow API
==============

**Supported version: 2.3**
**Supported version: 2.4.1, 2.3.1**

**Important**: This API document assumes you use the following import statement in your training scripts.

Expand Down Expand Up @@ -79,7 +79,7 @@ TensorFlow API
[...]
x = tf.constant(1.2)                     # placed in partition 0
with smp.partition(1):
    y = tf.add(x, tf.constant(2.3))      # placed in partition 1
    y = tf.add(x, tf.constant(2.3.1))      # placed in partition 1
    with smp.partition(3):
        z = tf.reduce_sum(y)             # placed in partition 3

Expand Down