Do transfers from PyTorch CUDA to XLA CUDA round trip to CPU?

If I have a PyTorch CUDA tensor and am using `torch-xla` on GPU, will this code roundtrip to CPU?:
```python
t = torch.randn(2, 2, device="cuda")
t2.to(device=xm.xla_device())
```

Similarly, with:
```python
t = torch.randn(2, 2, device=xm.xla_device())
t = t.square()
t2 = t.cuda()
```
does `t.cuda()` (1) trigger graph execution/synchronization and (2) does it involve a CPU round-trip?

(Context: we run models for inference in a context where the input data is provided already on the GPU and put into a torch tensor using `torch::from_blob`; it would be wasteful and likely eliminate any performance gains from XLA if we had to round trip the input data to CPU.)

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Do transfers from PyTorch CUDA to XLA CUDA round trip to CPU? #3452

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Do transfers from PyTorch CUDA to XLA CUDA round trip to CPU? #3452

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions