fix float8 training TP+SP integration tests

vkuzo · vkuzo · commit df346f10a985 · 2025-06-20T07:10:16.000-07:00
Summary: These tests do not run in CI, and they broke some time ago. The issue was that each tensor was created on "cuda:0" instead of using the local rank. For now, fixing by manually specifying the rank. I feel like there is probably a better way to do this as the rank is supposed to be set automatically, but leaving that for a future PR. We should add to CI in the future, saving that for a future PR. Test Plan: ```bash ./test/float8/test_dtensor.sh ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 39bd880 ghstack-comment-id: 2991778315 Pull Request resolved: #2414
diff --git a/test/float8/test_dtensor.py b/test/float8/test_dtensor.py
@@ -67,6 +67,8 @@ def setup_distributed():
     device_mesh = init_device_mesh("cuda", (world_size,))
     # seed must be the same in all processes
     torch.manual_seed(1)
+    local_rank = torch.distributed.get_rank()
+    torch.cuda.set_device(local_rank)
     return device_mesh
 
 
diff --git a/test/float8/test_fsdp2_tp.py b/test/float8/test_fsdp2_tp.py
@@ -46,6 +46,8 @@ def setup_distributed():
     )
     # seed must be the same in all processes
     torch.manual_seed(1)
+    local_rank = torch.distributed.get_rank()
+    torch.cuda.set_device(local_rank)
     return device_mesh
 
 

Original file line number	Diff line number	Diff line change
`@@ -46,6 +46,8 @@ def setup_distributed():`
`46`	`46`	`)`
`47`	`47`	`# seed must be the same in all processes`
`48`	`48`	`torch.manual_seed(1)`
	`49`	`+ local_rank = torch.distributed.get_rank()`
	`50`	`+ torch.cuda.set_device(local_rank)`
`49`	`51`	`return device_mesh`
`50`	`52`
`51`	`53`