Improve idist for gather using nccl and reduce for gloo + gpu

## 🚀 Feature

Consider the following piece of code
```python
def write_preds_to_file(predictions, filename):
    prediction_tensor = torch.tensor(predictions)
    prediction_tensor = idist.all_gather(prediction_tensor)

    if idist.get_rank() == 0:
        torch.save(prediction_tensor, filename)
```

The `idist.all_gather()` is used to collect the tensor from all the processes even if only the rank 0 needs it. The `gather()` method would be used but the backend `nccl` does not support it. See [here](https://pytorch.org/docs/stable/distributed.html).

The idea here is to implement the `gather()` method in `idist` using `all_gather()` for `nccl` (and `gather()` for others backends). Note that `reduce()` for `gloo` on GPU could be implemented using `all_reduce()` in a similar way.

It needs tests + docs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Improve idist for gather using nccl and reduce for gloo + gpu #2260

🚀 Feature

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Improve idist for gather using nccl and reduce for gloo + gpu #2260

Description

🚀 Feature

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions