Skip to content

Commit 9911c7b

Browse files
authored
Merge branch 'main' into dev/pd-deepep
2 parents e54a97a + 85ec044 commit 9911c7b

File tree

2 files changed

+3
-1
lines changed

2 files changed

+3
-1
lines changed

docs/references/benchmark_and_profiling.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,8 @@
6464

6565
This command sets the number of prompts to 2 with `--num-prompts` argument and limits the length of output sequences to 100 with `--sharegpt-output-len` argument, which can generate a small trace file for browser to open smoothly.
6666

67+
Additionally, if you want to locate the SGLang Python source code through the cuda kernel in Trace, you need to disable CUDA Graph when starting the service. This can be done by using the `--disable-cuda-graph` parameter in the command to start the service.
68+
6769
## Profile with Nsight
6870

6971
[Nsight systems](https://docs.nvidia.com/nsight-systems/) is an advanced tool that exposes more profiling details, such as register and shared memory usage, annotated code regions and low-level CUDA APIs and events.

sgl-kernel/CMakeLists.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ include(FetchContent)
4343
FetchContent_Declare(
4444
repo-cutlass
4545
GIT_REPOSITORY https://github.com/NVIDIA/cutlass
46-
GIT_TAG df8a550d3917b0e97f416b2ed8c2d786f7f686a3
46+
GIT_TAG 5e497243f7ad13a2aa842143f9b10bbb23d98292
4747
GIT_SHALLOW OFF
4848
)
4949
FetchContent_Populate(repo-cutlass)

0 commit comments

Comments
 (0)