Open
Description
If I have a PyTorch CUDA tensor and am using torch-xla
on GPU, will this code roundtrip to CPU?:
t = torch.randn(2, 2, device="cuda")
t2.to(device=xm.xla_device())
Similarly, with:
t = torch.randn(2, 2, device=xm.xla_device())
t = t.square()
t2 = t.cuda()
does t.cuda()
(1) trigger graph execution/synchronization and (2) does it involve a CPU round-trip?
(Context: we run models for inference in a context where the input data is provided already on the GPU and put into a torch tensor using torch::from_blob
; it would be wasteful and likely eliminate any performance gains from XLA if we had to round trip the input data to CPU.)
Thanks!
Metadata
Metadata
Assignees
Labels
No labels