Skip to content

Commit 1131acd

Browse files
committed
cuda graphs.
1 parent e26d5c6 commit 1131acd

File tree

1 file changed

+4
-1
lines changed

1 file changed

+4
-1
lines changed

examples/profiling/README.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -312,4 +312,7 @@ kernels, so this tiny sync balloons to 2.3s.
312312
* As mentioned above, we profiled with regional compilation so it's possible that
313313
there are still some gaps outside the compiled regions. A full compilation
314314
will likely mitigate it. In case it doesn't, the above observations could
315-
be useful to mitigate that.
315+
be useful to mitigate that.
316+
* Use of CUDA Graphs can also help mitigate CPU overhead related issues. When
317+
using "reduce-overhead" and "max-autotune" in `torch.compile` triggers the
318+
use of CUDA Graphs.

0 commit comments

Comments
 (0)