Skip to content

Commit b7d52b8

Browse files
committed
[RFC] Generalize pytorch content for non-native device execution
1 parent 87f4656 commit b7d52b8

File tree

1 file changed

+52
-0
lines changed

1 file changed

+52
-0
lines changed

RFC-0039-generalize-pytorch-ut.md

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
2+
# [RFC] Generalization of PyTorch framework UT for non-cuda device execution
3+
4+
**Authors:**
5+
* @ankurneog
6+
7+
8+
## **Summary**
9+
Modify PyTorch framework UTs so that non-cuda devices such as intel Gaudi and intel XPU is able to harness the content and improve quality.
10+
11+
12+
## **Motivation**
13+
The Pytorch framework UTs are good indicator for device stack health, however these are mostly written for cpu and cuda devices, which restricts its use for non-cuda devices.
14+
15+
We propose to modify the content wherever possible to make it available for non-cuda device execution
16+
17+
This will also ensure greater participation for content enhancement.
18+
19+
## **Proposed Implementation**
20+
Since the content is huge, we propose a staggered approach for the implementation
21+
Steps:
22+
* Remove restriction imposed through @onlyNativeDevices in core content, replace these with hooks so that supported devices can enable their content selectively.
23+
These should be flexible enough to support both in-tree and out-of-tree devices.
24+
* Dtypes for a device should be dynamically loaded per op based on a common dictionary, instead of using different variables per device , eg: dtypesIfCuda
25+
* Miscelleneous decorators such as @skipIfCuda should be generalized @skipIfDevice
26+
* Extend use of instantiate_device_type for all content, so that developers are forced to use generalized device code rather than using "cuda" or "cpu"
27+
* Generalize common distributed content , so that it can be extended for non nccl backends such as intel's hccl and ccl
28+
* Generalize the dynamo content for specific backends which other devices might want to verify with existing content.
29+
30+
31+
32+
#### Metrics
33+
Other devices can track the pass-percentage and be part of the CI if the coverage and pass percentage is good.
34+
35+
#### Additional Context
36+
Towards adding support for Intel Gaudi devices we have already done couple of changes in this regard.
37+
* Removing onlyNativeDevice : https://github.com/pytorch/pytorch/pull/128584
38+
39+
* Changing Dynamo Content : https://github.com/pytorch/pytorch/pull/130714
40+
41+
* Generalizing Distributed Content : https://github.com/pytorch/pytorch/pull/131758
42+
43+
* Generalizing FSDP Content : https://github.com/pytorch/pytorch/pull/133209
44+
45+
More to follow
46+
47+
48+
### Next Steps
49+
As part of introducing support for intel Gaudi which is an out-of-tree device, we are already introduces changes to support it in a manner that can be used by other devices as well.
50+
51+
52+

0 commit comments

Comments
 (0)