Skip to content

[Bug] TP worker cuda graph capture NCCL error #5770

@jokerwyt

Description

@jokerwyt

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.
  • 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
  • 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
  • 5. Please use English, otherwise it will be closed.

Describe the bug

Locate to #5728
cc @Edenzzzz @merrymercy
Possibly relate to CUDA_VISABLE_DEVICES

Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00,  5.99it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00,  5.98it/s]

[2025-04-26 20:55:30 TP3] Load weight end. type=Qwen2ForCausalLM, dtype=torch.bfloat16, avail mem=90.00 GB, mem usage=0.94 GB.
[2025-04-26 20:55:30 TP0] Load weight end. type=Qwen2ForCausalLM, dtype=torch.bfloat16, avail mem=89.06 GB, mem usage=0.94 GB.
[2025-04-26 20:55:30 TP1] Load weight end. type=Qwen2ForCausalLM, dtype=torch.bfloat16, avail mem=88.82 GB, mem usage=0.94 GB.
[2025-04-26 20:55:30 TP2] Load weight end. type=Qwen2ForCausalLM, dtype=torch.bfloat16, avail mem=88.82 GB, mem usage=0.94 GB.
[2025-04-26 20:55:30 TP2] KV Cache is allocated. #tokens: 4635840, K size: 30.95 GB, V size: 30.95 GB
[2025-04-26 20:55:30 TP0] KV Cache is allocated. #tokens: 4635840, K size: 30.95 GB, V size: 30.95 GB
[2025-04-26 20:55:30 TP0] Memory pool end. avail mem=27.14 GB
[2025-04-26 20:55:30 TP2] Memory pool end. avail mem=26.90 GB
[2025-04-26 20:55:30 TP3] KV Cache is allocated. #tokens: 4635840, K size: 30.95 GB, V size: 30.95 GB
[2025-04-26 20:55:30 TP3] Memory pool end. avail mem=28.07 GB
[2025-04-26 20:55:30 TP1] KV Cache is allocated. #tokens: 4635840, K size: 30.95 GB, V size: 30.95 GB
[2025-04-26 20:55:30 TP1] Memory pool end. avail mem=26.90 GB
[2025-04-26 20:55:30 TP0] Capture cuda graph begin. This can take up to several minutes. avail mem=27.04 GB
[2025-04-26 20:55:30 TP3] Capture cuda graph begin. This can take up to several minutes. avail mem=27.98 GB
[2025-04-26 20:55:30 TP2] Capture cuda graph begin. This can take up to several minutes. avail mem=26.81 GB
[2025-04-26 20:55:30 TP1] Capture cuda graph begin. This can take up to several minutes. avail mem=26.81 GB
Capturing batches (avail_mem=27.02 GB):   0%|          | 0/8 [00:00<?, ?it/s][TENCENT64:293198:0:293198] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x78ba1)
[TENCENT64:293195:0:293195] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x78ba1)
BFD: DWARF error: section .debug_info is larger than its filesize! (0x5d4f30 vs 0x436a28)
BFD: DWARF error: section .debug_info is larger than its filesize! (0x5d4f30 vs 0x436a28)
[TENCENT64:293197:0:293197] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x78ba1)
[TENCENT64:293196:0:293196] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x78ba1)
BFD: DWARF error: section .debug_info is larger than its filesize! (0x5d4f30 vs 0x436a28)
BFD: DWARF error: section .debug_info is larger than its filesize! (0x5d4f30 vs 0x436a28)
BFD: DWARF error: section .debug_info is larger than its filesize! (0x5d4f30 vs 0x436a28)
BFD: DWARF error: section .debug_info is larger than its filesize! (0x5d4f30 vs 0x436a28)
BFD: DWARF error: section .debug_info is larger than its filesize! (0x5d4f30 vs 0x436a28)
BFD: DWARF error: section .debug_info is larger than its filesize! (0x5d4f30 vs 0x436a28)
==== backtrace (tid: 293195) ====
 0 0x0000000000042520 __sigaction()  ???:0
 1 0x000000000005bb28 ncclGroupCommJoin()  /dvs/p4/build/sw/gpgpu/nccl/gitfusion/stable/src/include/group.h:113
 2 0x000000000005bb28 taskAppend()  /dvs/p4/build/sw/gpgpu/nccl/gitfusion/stable/src/enqueue.cc:2152
 3 0x000000000005bb28 ncclEnqueueCheck()  /dvs/p4/build/sw/gpgpu/nccl/gitfusion/stable/src/enqueue.cc:2224
 4 0x000000000004e991 ncclAllGather()  /dvs/p4/build/sw/gpgpu/nccl/gitfusion/stable/src/collectives.cc:88
 5 0x0000000000007e2e ffi_prep_go_closure()  ???:0
 6 0x0000000000004493 ???()  /lib/x86_64-linux-gnu/libffi.so.8:0
 7 0x000000000000a3e9 ???()  /usr/lib/python3.10/lib-dynload/_ctypes.cpython-310-x86_64-linux-gnu.so:0
 8 0x0000000000013302 ???()  /usr/lib/python3.10/lib-dynload/_ctypes.cpython-310-x86_64-linux-gnu.so:0
 9 0x000000000018139b _PyObject_MakeTpCall()  ???:0
10 0x000000000017aa97 _PyEval_EvalFrameDefault()  ???:0
11 0x000000000018b38c _PyFunction_Vectorcall()  ???:0
12 0x000000000017597f _PyEval_EvalFrameDefault()  ???:0
13 0x000000000018b38c _PyFunction_Vectorcall()  ???:0
14 0x000000000017597f _PyEval_EvalFrameDefault()  ???:0
15 0x000000000018b38c _PyFunction_Vectorcall()  ???:0
16 0x000000000017597f _PyEval_EvalFrameDefault()  ???:0
17 0x000000000018b38c _PyFunction_Vectorcall()  ???:0
18 0x0000000000a6d3a7 pybind11::detail::object_api<pybind11::handle>::operator()<(pybind11::return_value_policy)1, pybind11::detail::args_proxy, pybind11::detail::kwargs_proxy>()  :0
19 0x0000000000d96640 torch::impl::dispatch::PythonKernelHolder::operator()()  :0
20 0x00000000058bc27b c10::OperatorHandle::redispatchBoxed()  :0
21 0x00000000058b9af9 torch::autograd::basicAutogradNotImplementedFallbackImpl()  autograd_not_implemented_fallback.cpp:0
22 0x0000000001aca9f8 c10::BoxedKernel::make_boxed_function<&(anonymous namespace)::autograd_fallback>()  VariableFallbackKernel.cpp:0
23 0x0000000000da1457 c10::Dispatcher::callBoxed()  ???:0
24 0x0000000000b2c2e6 torch::jit::invokeOperatorFromPython()  ???:0
25 0x0000000000b2c647 torch::jit::_get_operation_for_overload_or_packet()  ???:0
26 0x0000000000a1b592 pybind11::cpp_function::initialize<torch::jit::initJITBindings(_object*)::{lambda(std::string const&)#217}::operator()(std::string const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}, pybind11::object, pybind11::args const&, pybind11::kwargs const&, pybind11::name, pybind11::doc>(torch::jit::initJITBindings(_object*)::{lambda(std::string const&)#217}::operator()(std::string const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}&&, pybind11::object (*)(pybind11::args const&, pybind11::kwargs const&), pybind11::name const&, pybind11::doc const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN()  init.cpp:0
27 0x0000000000518d37 pybind11::cpp_function::dispatcher()  :0
28 0x000000000018ab32 PyObject_CallFunctionObjArgs()  ???:0
29 0x000000000019910b PyObject_Call()  ???:0
30 0x000000000017b6ef _PyEval_EvalFrameDefault()  ???:0
31 0x000000000018b38c _PyFunction_Vectorcall()  ???:0
32 0x000000000018061d _PyObject_FastCallDictTstate()  ???:0
33 0x000000000019562c _PyObject_Call_Prepend()  ???:0
34 0x000000000029d464 PyInit__datetime()  ???:0
35 0x000000000018139b _PyObject_MakeTpCall()  ???:0
36 0x000000000017b99e _PyEval_EvalFrameDefault()  ???:0
37 0x000000000018b38c _PyFunction_Vectorcall()  ???:0
38 0x000000000017597f _PyEval_EvalFrameDefault()  ???:0
39 0x000000000018b38c _PyFunction_Vectorcall()  ???:0
40 0x000000000017597f _PyEval_EvalFrameDefault()  ???:0
41 0x000000000018b38c _PyFunction_Vectorcall()  ???:0
42 0x0000000000175790 _PyEval_EvalFrameDefault()  ???:0
43 0x00000000001984d1 PyMethod_New()  ???:0
44 0x000000000017a702 _PyEval_EvalFrameDefault()  ???:0
45 0x000000000019861e PyMethod_New()  ???:0
46 0x0000000000177c30 _PyEval_EvalFrameDefault()  ???:0
47 0x000000000019861e PyMethod_New()  ???:0
48 0x0000000000177c30 _PyEval_EvalFrameDefault()  ???:0
49 0x0000000000180574 _PyObject_FastCallDictTstate()  ???:0
50 0x000000000019562c _PyObject_Call_Prepend()  ???:0
51 0x000000000029d464 PyInit__datetime()  ???:0
52 0x000000000018139b _PyObject_MakeTpCall()  ???:0
53 0x000000000017b009 _PyEval_EvalFrameDefault()  ???:0
54 0x000000000018b38c _PyFunction_Vectorcall()  ???:0
55 0x0000000000177c30 _PyEval_EvalFrameDefault()  ???:0
56 0x00000000001984d1 PyMethod_New()  ???:0
=================================
Fatal Python error: Segmentation fault

Thread 0x00007f6f51fff640 (most recent call first):
  File "/usr/lib/python3.10/threading.py", line 324 in wait
  File "/usr/lib/python3.10/threading.py", line 607 in wait
  File "/usr/local/lib/python3.10/dist-packages/tqdm/_monitor.py", line 60 in run
  File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x00007f6fb0a80640 (most recent call first):
  File "/usr/lib/python3.10/threading.py", line 324 in wait
  File "/usr/lib/python3.10/threading.py", line 607 in wait
  File "/usr/local/lib/python3.10/dist-packages/tqdm/_monitor.py", line 60 in run
  File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap

Current thread 0x00007f76a8c0b740 (most recent call first):
  File "/sgl-workspace/sglang/python/sglang/srt/distributed/device_communicators/pynccl_wrapper.py", line 413 in ncclAllGather
  File "/sgl-workspace/sglang/python/sglang/srt/distributed/device_communicators/pynccl.py", line 162 in all_gather
  File "/sgl-workspace/sglang/python/sglang/srt/distributed/parallel_state.py", line 460 in _all_gather_into_tensor
  File "/sgl-workspace/sglang/python/sglang/srt/distributed/parallel_state.py", line 149 in reg_all_gather_into_tensor
  File "/usr/local/lib/python3.10/dist-packages/torch/_ops.py", line 1123 in __call__
  File "/sgl-workspace/sglang/python/sglang/srt/distributed/parallel_state.py", line 470 in all_gather_into_tensor
  File "/sgl-workspace/sglang/python/sglang/srt/distributed/parallel_state.py", line 513 in all_gather
  File "/sgl-workspace/sglang/python/sglang/srt/distributed/communication_op.py", line 20 in tensor_model_parallel_all_gather
  File "/sgl-workspace/sglang/python/sglang/srt/layers/logits_processor.py", line 445 in _get_logits
  File "/sgl-workspace/sglang/python/sglang/srt/layers/logits_processor.py", line 311 in forward
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1750 in _call_impl
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1739 in _wrapped_call_impl
  File "/sgl-workspace/sglang/python/sglang/srt/models/qwen2.py", line 385 in forward
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116 in decorate_context
  File "/sgl-workspace/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 445 in run_once
  File "/sgl-workspace/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 452 in capture_one_batch_size
  File "/sgl-workspace/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 360 in capture
  File "/sgl-workspace/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 276 in __init__
  File "/sgl-workspace/sglang/python/sglang/srt/model_executor/model_runner.py", line 965 in init_cuda_graphs
  File "/sgl-workspace/sglang/python/sglang/srt/model_executor/model_runner.py", line 219 in initialize
  File "/sgl-workspace/sglang/python/sglang/srt/model_executor/model_runner.py", line 181 in __init__
  File "/sgl-workspace/sglang/python/sglang/srt/managers/tp_worker.py", line 75 in __init__
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 261 in __init__
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 2012 in run_scheduler_process
  File "/usr/lib/python3.10/multiprocessing/process.py", line 108 in run
  File "/usr/lib/python3.10/multiprocessing/process.py", line 314 in _bootstrap
  File "/usr/lib/python3.10/multiprocessing/spawn.py", line 129 in _main
  File "/usr/lib/python3.10/multiprocessing/spawn.py", line 116 in spawn_main
  File "<string>", line 1 in <module>

Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, charset_normalizer.md, requests.packages.charset_normalizer.md, requests.packages.chardet.md, multidict._multidict, yarl._quoting_c, propcache._helpers_c, aiohttp._http_writer, aiohttp._http_parser, aiohttp._websocket.mask, aiohttp._websocket.reader_c, frozenlist._frozenlist, uvloop.loop, zmq.backend.cython._zmq, PIL._imaging, torch._C, torch._C._dynamo.autograd_compiler, torch._C._dynamo.eval_frame, torch._C._dynamo.guards, torch._C._dynamo.utils, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, psutil._psutil_linux, psutil._psutil_posix, setproctitle, yaml._yaml, markupsafe._speedups, PIL._imagingft, sklearn.__check_build._check_build, scipy._lib._ccallback_c, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg.cython_lapack, scipy.linalg._cythonized_array_utils, scipy.linalg._solve_toeplitz, scipy.linalg._decomp_lu_cython, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg._matfuncs_expm, scipy.linalg._linalg_pythran, scipy.linalg.cython_blas, scipy.linalg._decomp_update, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.linalg._propack._spropack, scipy.sparse.linalg._propack._dpropack, scipy.sparse.linalg._propack._cpropack, scipy.sparse.linalg._propack._zpropack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, scipy.special._ufuncs_cxx, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ellip_harm_2, scipy.spatial._ckdtree, scipy._lib.messagestream, scipy.spatial._qhull, scipy.spatial._voronoi, scipy.spatial._distance_wrap, scipy.spatial._hausdorff, scipy.spatial.transform._rotation, scipy.optimize._group_columns, scipy.optimize._trlib._trlib, scipy.optimize._lbfgsb, _moduleTNC, scipy.optimize._moduleTNC, scipy.optimize._cobyla, scipy.optimize._slsqp, scipy.optimize._minpack, scipy.optimize._lsq.givens_elimination, scipy.optimize._zeros, scipy.optimize._cython_nnls, scipy._lib._uarray._uarray, scipy.linalg._decomp_interpolative, scipy.optimize._bglu_dense, scipy.optimize._lsap, scipy.optimize._direct, scipy.integrate._odepack, scipy.integrate._quadpack, scipy.integrate._vode, scipy.integrate._dop, scipy.integrate._lsoda, scipy.interpolate._fitpack, scipy.interpolate._dfitpack, scipy.interpolate._dierckx, scipy.interpolate._ppoly, scipy.interpolate._interpnd, scipy.interpolate._rbfinterp_pythran, scipy.interpolate._rgi_cython, scipy.interpolate._bspl, scipy.special.cython_special, scipy.stats._stats, scipy.stats._sobol, scipy.stats._qmc_cy, scipy.stats._biasedurn, scipy.stats._stats_pythran, scipy.stats._levy_stable.levyst, scipy.stats._ansari_swilk_statistics, scipy.stats._mvn, scipy.stats._rcont.rcont, scipy.ndimage._nd_image, scipy.ndimage._rank_filter_1d, _ni_label, scipy.ndimage._ni_label, pyarrow.lib, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.strptime, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.lib, pyarrow._compute, pandas._libs.ops, pandas._libs.hashing, pandas._libs.arrays, pandas._libs.tslib, pandas._libs.sparse, pandas._libs.internals, pandas._libs.indexing, pandas._libs.index, pandas._libs.writers, pandas._libs.join, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.json, pandas._libs.parsers, pandas._libs.testing, sklearn.utils._isfinite, sklearn.utils.sparsefuncs_fast, sklearn.utils.murmurhash, sklearn.utils._openmp_helpers, sklearn.metrics.cluster._expected_mutual_info_fast, sklearn.preprocessing._csr_polynomial_expansion, sklearn.preprocessing._target_encoder_fast, sklearn.metrics._dist_metrics, sklearn.metrics._pairwise_distances_reduction._datasets_pair, sklearn.utils._cython_blas, sklearn.metrics._pairwise_distances_reduction._base, sklearn.metrics._pairwise_distances_reduction._middle_term_computer, sklearn.utils._heap, sklearn.utils._sorting, sklearn.metrics._pairwise_distances_reduction._argkmin, sklearn.metrics._pairwise_distances_reduction._argkmin_classmode, sklearn.utils._vector_sentinel, sklearn.metrics._pairwise_distances_reduction._radius_neighbors, sklearn.metrics._pairwise_distances_reduction._radius_neighbors_classmode, sklearn.metrics._pairwise_fast, sentencepiece._sentencepiece, msgspec._core, _cffi_backend, msgpack._cmsgpack, google._upb._message, ray._raylet, cuda_utils (total: 197)
==== backtrace (tid: 293198) ====
 0 0x0000000000042520 __sigaction()  ???:0
 1 0x000000000005bb28 ncclGroupCommJoin()  /dvs/p4/build/sw/gpgpu/nccl/gitfusion/stable/src/include/group.h:113
 2 0x000000000005bb28 taskAppend()  /dvs/p4/build/sw/gpgpu/nccl/gitfusion/stable/src/enqueue.cc:2152
 3 0x000000000005bb28 ncclEnqueueCheck()  /dvs/p4/build/sw/gpgpu/nccl/gitfusion/stable/src/enqueue.cc:2224
 4 0x000000000004e991 ncclAllGather()  /dvs/p4/build/sw/gpgpu/nccl/gitfusion/stable/src/collectives.cc:88
 5 0x0000000000007e2e ffi_prep_go_closure()  ???:0
 6 0x0000000000004493 ???()  /lib/x86_64-linux-gnu/libffi.so.8:0
 7 0x000000000000a3e9 ???()  /usr/lib/python3.10/lib-dynload/_ctypes.cpython-310-x86_64-linux-gnu.so:0
 8 0x0000000000013302 ???()  /usr/lib/python3.10/lib-dynload/_ctypes.cpython-310-x86_64-linux-gnu.so:0
 9 0x000000000018139b _PyObject_MakeTpCall()  ???:0
10 0x000000000017aa97 _PyEval_EvalFrameDefault()  ???:0
11 0x000000000018b38c _PyFunction_Vectorcall()  ???:0
12 0x000000000017597f _PyEval_EvalFrameDefault()  ???:0
13 0x000000000018b38c _PyFunction_Vectorcall()  ???:0
14 0x000000000017597f _PyEval_EvalFrameDefault()  ???:0
15 0x000000000018b38c _PyFunction_Vectorcall()  ???:0
16 0x000000000017597f _PyEval_EvalFrameDefault()  ???:0
17 0x000000000018b38c _PyFunction_Vectorcall()  ???:0
18 0x0000000000a6d3a7 pybind11::detail::object_api<pybind11::handle>::operator()<(pybind11::return_value_policy)1, pybind11::detail::args_proxy, pybind11::detail::kwargs_proxy>()  :0
19 0x0000000000d96640 torch::impl::dispatch::PythonKernelHolder::operator()()  :0
20 0x00000000058bc27b c10::OperatorHandle::redispatchBoxed()  :0
21 0x00000000058b9af9 torch::autograd::basicAutogradNotImplementedFallbackImpl()  autograd_not_implemented_fallback.cpp:0
22 0x0000000001aca9f8 c10::BoxedKernel::make_boxed_function<&(anonymous namespace)::autograd_fallback>()  VariableFallbackKernel.cpp:0
23 0x0000000000da1457 c10::Dispatcher::callBoxed()  ???:0
24 0x0000000000b2c2e6 torch::jit::invokeOperatorFromPython()  ???:0
25 0x0000000000b2c647 torch::jit::_get_operation_for_overload_or_packet()  ???:0
26 0x0000000000a1b592 pybind11::cpp_function::initialize<torch::jit::initJITBindings(_object*)::{lambda(std::string const&)#217}::operator()(std::string const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}, pybind11::object, pybind11::args const&, pybind11::kwargs const&, pybind11::name, pybind11::doc>(torch::jit::initJITBindings(_object*)::{lambda(std::string const&)#217}::operator()(std::string const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}&&, pybind11::object (*)(pybind11::args const&, pybind11::kwargs const&), pybind11::name const&, pybind11::doc const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN()  init.cpp:0
27 0x0000000000518d37 pybind11::cpp_function::dispatcher()  :0
28 0x000000000018ab32 PyObject_CallFunctionObjArgs()  ???:0
29 0x000000000019910b PyObject_Call()  ???:0
30 0x000000000017b6ef _PyEval_EvalFrameDefault()  ???:0
31 0x000000000018b38c _PyFunction_Vectorcall()  ???:0
32 0x000000000018061d _PyObject_FastCallDictTstate()  ???:0
33 0x000000000019562c _PyObject_Call_Prepend()  ???:0
34 0x000000000029d464 PyInit__datetime()  ???:0
35 0x000000000018139b _PyObject_MakeTpCall()  ???:0
36 0x000000000017b99e _PyEval_EvalFrameDefault()  ???:0
37 0x000000000018b38c _PyFunction_Vectorcall()  ???:0
38 0x000000000017597f _PyEval_EvalFrameDefault()  ???:0
39 0x000000000018b38c _PyFunction_Vectorcall()  ???:0
40 0x000000000017597f _PyEval_EvalFrameDefault()  ???:0
41 0x000000000018b38c _PyFunction_Vectorcall()  ???:0
42 0x0000000000175790 _PyEval_EvalFrameDefault()  ???:0
43 0x00000000001984d1 PyMethod_New()  ???:0
44 0x000000000017a702 _PyEval_EvalFrameDefault()  ???:0
45 0x000000000019861e PyMethod_New()  ???:0
46 0x0000000000177c30 _PyEval_EvalFrameDefault()  ???:0
47 0x000000000019861e PyMethod_New()  ???:0
48 0x0000000000177c30 _PyEval_EvalFrameDefault()  ???:0
49 0x0000000000180574 _PyObject_FastCallDictTstate()  ???:0
50 0x000000000019562c _PyObject_Call_Prepend()  ???:0
51 0x000000000029d464 PyInit__datetime()  ???:0
52 0x000000000018139b _PyObject_MakeTpCall()  ???:0
53 0x000000000017b009 _PyEval_EvalFrameDefault()  ???:0
54 0x000000000018b38c _PyFunction_Vectorcall()  ???:0
55 0x0000000000177c30 _PyEval_EvalFrameDefault()  ???:0
56 0x00000000001984d1 PyMethod_New()  ???:0
=================================
Fatal Python error: Segmentation fault

Thread 0x00007fd7a70e2640 (most recent call first):
  File "/usr/lib/python3.10/threading.py", line 324 in wait
  File "/usr/lib/python3.10/threading.py", line 607 in wait
  File "/usr/local/lib/python3.10/dist-packages/tqdm/_monitor.py", line 60 in run
  File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap

Current thread 0x00007fdea6275740 (most recent call first):
  File "/sgl-workspace/sglang/python/sglang/srt/distributed/device_communicators/pynccl_wrapper.py", line 413 in ncclAllGather
  File "/sgl-workspace/sglang/python/sglang/srt/distributed/device_communicators/pynccl.py", line 162 in all_gather
  File "/sgl-workspace/sglang/python/sglang/srt/distributed/parallel_state.py", line 460 in _all_gather_into_tensor
  File "/sgl-workspace/sglang/python/sglang/srt/distributed/parallel_state.py", line 149 in reg_all_gather_into_tensor
  File "/usr/local/lib/python3.10/dist-packages/torch/_ops.py", line 1123 in __call__
  File "/sgl-workspace/sglang/python/sglang/srt/distributed/parallel_state.py", line 470 in all_gather_into_tensor
  File "/sgl-workspace/sglang/python/sglang/srt/distributed/parallel_state.py", line 513 in all_gather
  File "/sgl-workspace/sglang/python/sglang/srt/distributed/communication_op.py", line 20 in tensor_model_parallel_all_gather
  File "/sgl-workspace/sglang/python/sglang/srt/layers/logits_processor.py", line 445 in _get_logits
  File "/sgl-workspace/sglang/python/sglang/srt/layers/logits_processor.py", line 311 in forward
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1750 in _call_impl
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1739 in _wrapped_call_impl
  File "/sgl-workspace/sglang/python/sglang/srt/models/qwen2.py", line 385 in forward
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116 in decorate_context
  File "/sgl-workspace/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 445 in run_once
  File "/sgl-workspace/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 452 in capture_one_batch_size
  File "/sgl-workspace/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 360 in capture
  File "/sgl-workspace/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 276 in __init__
  File "/sgl-workspace/sglang/python/sglang/srt/model_executor/model_runner.py", line 965 in init_cuda_graphs
  File "/sgl-workspace/sglang/python/sglang/srt/model_executor/model_runner.py", line 219 in initialize
  File "/sgl-workspace/sglang/python/sglang/srt/model_executor/model_runner.py", line 181 in __init__
  File "/sgl-workspace/sglang/python/sglang/srt/managers/tp_worker.py", line 75 in __init__
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 261 in __init__
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 2012 in run_scheduler_process
  File "/usr/lib/python3.10/multiprocessing/process.py", line 108 in run
  File "/usr/lib/python3.10/multiprocessing/process.py", line 314 in _bootstrap
  File "/usr/lib/python3.10/multiprocessing/spawn.py", line 129 in _main
  File "/usr/lib/python3.10/multiprocessing/spawn.py", line 116 in spawn_main
  File "<string>", line 1 in <module>

Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, charset_normalizer.md, requests.packages.charset_normalizer.md, requests.packages.chardet.md, multidict._multidict, yarl._quoting_c, propcache._helpers_c, aiohttp._http_writer, aiohttp._http_parser, aiohttp._websocket.mask, aiohttp._websocket.reader_c, frozenlist._frozenlist, uvloop.loop, zmq.backend.cython._zmq, PIL._imaging, torch._C, torch._C._dynamo.autograd_compiler, torch._C._dynamo.eval_frame, torch._C._dynamo.guards, torch._C._dynamo.utils, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, psutil._psutil_linux, psutil._psutil_posix, setproctitle, yaml._yaml, markupsafe._speedups, PIL._imagingft, sklearn.__check_build._check_build, scipy._lib._ccallback_c, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg.cython_lapack, scipy.linalg._cythonized_array_utils, scipy.linalg._solve_toeplitz, scipy.linalg._decomp_lu_cython, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg._matfuncs_expm, scipy.linalg._linalg_pythran, scipy.linalg.cython_blas, scipy.linalg._decomp_update, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.linalg._propack._spropack, scipy.sparse.linalg._propack._dpropack, scipy.sparse.linalg._propack._cpropack, scipy.sparse.linalg._propack._zpropack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, scipy.special._ufuncs_cxx, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ellip_harm_2, scipy.spatial._ckdtree, scipy._lib.messagestream, scipy.spatial._qhull, scipy.spatial._voronoi, scipy.spatial._distance_wrap, scipy.spatial._hausdorff, scipy.spatial.transform._rotation, scipy.optimize._group_columns, scipy.optimize._trlib._trlib, scipy.optimize._lbfgsb, _moduleTNC, scipy.optimize._moduleTNC, scipy.optimize._cobyla, scipy.optimize._slsqp, scipy.optimize._minpack, scipy.optimize._lsq.givens_elimination, scipy.optimize._zeros, scipy.optimize._cython_nnls, scipy._lib._uarray._uarray, scipy.linalg._decomp_interpolative, scipy.optimize._bglu_dense, scipy.optimize._lsap, scipy.optimize._direct, scipy.integrate._odepack, scipy.integrate._quadpack, scipy.integrate._vode, scipy.integrate._dop, scipy.integrate._lsoda, scipy.interpolate._fitpack, scipy.interpolate._dfitpack, scipy.interpolate._dierckx, scipy.interpolate._ppoly, scipy.interpolate._interpnd, scipy.interpolate._rbfinterp_pythran, scipy.interpolate._rgi_cython, scipy.interpolate._bspl, scipy.special.cython_special, scipy.stats._stats, scipy.stats._sobol, scipy.stats._qmc_cy, scipy.stats._biasedurn, scipy.stats._stats_pythran, scipy.stats._levy_stable.levyst, scipy.stats._ansari_swilk_statistics, scipy.stats._mvn, scipy.stats._rcont.rcont, scipy.ndimage._nd_image, scipy.ndimage._rank_filter_1d, _ni_label, scipy.ndimage._ni_label, pyarrow.lib, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.strptime, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.lib, pyarrow._compute, pandas._libs.ops, pandas._libs.hashing, pandas._libs.arrays, pandas._libs.tslib, pandas._libs.sparse, pandas._libs.internals, pandas._libs.indexing, pandas._libs.index, pandas._libs.writers, pandas._libs.join, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.json, pandas._libs.parsers, pandas._libs.testing, sklearn.utils._isfinite, sklearn.utils.sparsefuncs_fast, sklearn.utils.murmurhash, sklearn.utils._openmp_helpers, sklearn.metrics.cluster._expected_mutual_info_fast, sklearn.preprocessing._csr_polynomial_expansion, sklearn.preprocessing._target_encoder_fast, sklearn.metrics._dist_metrics, sklearn.metrics._pairwise_distances_reduction._datasets_pair, sklearn.utils._cython_blas, sklearn.metrics._pairwise_distances_reduction._base, sklearn.metrics._pairwise_distances_reduction._middle_term_computer, sklearn.utils._heap, sklearn.utils._sorting, sklearn.metrics._pairwise_distances_reduction._argkmin, sklearn.metrics._pairwise_distances_reduction._argkmin_classmode, sklearn.utils._vector_sentinel, sklearn.metrics._pairwise_distances_reduction._radius_neighbors, sklearn.metrics._pairwise_distances_reduction._radius_neighbors_classmode, sklearn.metrics._pairwise_fast, sentencepiece._sentencepiece, msgspec._core, _cffi_backend, msgpack._cmsgpack, google._upb._message, ray._raylet, cuda_utils (total: 197)
==== backtrace (tid: 293197) ====
 0 0x0000000000042520 __sigaction()  ???:0
 1 0x000000000005bb28 ncclGroupCommJoin()  /dvs/p4/build/sw/gpgpu/nccl/gitfusion/stable/src/include/group.h:113
 2 0x000000000005bb28 taskAppend()  /dvs/p4/build/sw/gpgpu/nccl/gitfusion/stable/src/enqueue.cc:2152
 3 0x000000000005bb28 ncclEnqueueCheck()  /dvs/p4/build/sw/gpgpu/nccl/gitfusion/stable/src/enqueue.cc:2224
 4 0x000000000004e991 ncclAllGather()  /dvs/p4/build/sw/gpgpu/nccl/gitfusion/stable/src/collectives.cc:88
 5 0x0000000000007e2e ffi_prep_go_closure()  ???:0
 6 0x0000000000004493 ???()  /lib/x86_64-linux-gnu/libffi.so.8:0
 7 0x000000000000a3e9 ???()  /usr/lib/python3.10/lib-dynload/_ctypes.cpython-310-x86_64-linux-gnu.so:0
 8 0x0000000000013302 ???()  /usr/lib/python3.10/lib-dynload/_ctypes.cpython-310-x86_64-linux-gnu.so:0
 9 0x000000000018139b _PyObject_MakeTpCall()  ???:0
10 0x000000000017aa97 _PyEval_EvalFrameDefault()  ???:0
11 0x000000000018b38c _PyFunction_Vectorcall()  ???:0
12 0x000000000017597f _PyEval_EvalFrameDefault()  ???:0
13 0x000000000018b38c _PyFunction_Vectorcall()  ???:0
14 0x000000000017597f _PyEval_EvalFrameDefault()  ???:0
15 0x000000000018b38c _PyFunction_Vectorcall()  ???:0
16 0x000000000017597f _PyEval_EvalFrameDefault()  ???:0
17 0x000000000018b38c _PyFunction_Vectorcall()  ???:0
18 0x0000000000a6d3a7 pybind11::detail::object_api<pybind11::handle>::operator()<(pybind11::return_value_policy)1, pybind11::detail::args_proxy, pybind11::detail::kwargs_proxy>()  :0
19 0x0000000000d96640 torch::impl::dispatch::PythonKernelHolder::operator()()  :0
20 0x00000000058bc27b c10::OperatorHandle::redispatchBoxed()  :0
21 0x00000000058b9af9 torch::autograd::basicAutogradNotImplementedFallbackImpl()  autograd_not_implemented_fallback.cpp:0
22 0x0000000001aca9f8 c10::BoxedKernel::make_boxed_function<&(anonymous namespace)::autograd_fallback>()  VariableFallbackKernel.cpp:0
23 0x0000000000da1457 c10::Dispatcher::callBoxed()  ???:0
24 0x0000000000b2c2e6 torch::jit::invokeOperatorFromPython()  ???:0
25 0x0000000000b2c647 torch::jit::_get_operation_for_overload_or_packet()  ???:0
26 0x0000000000a1b592 pybind11::cpp_function::initialize<torch::jit::initJITBindings(_object*)::{lambda(std::string const&)#217}::operator()(std::string const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}, pybind11::object, pybind11::args const&, pybind11::kwargs const&, pybind11::name, pybind11::doc>(torch::jit::initJITBindings(_object*)::{lambda(std::string const&)#217}::operator()(std::string const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}&&, pybind11::object (*)(pybind11::args const&, pybind11::kwargs const&), pybind11::name const&, pybind11::doc const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN()  init.cpp:0
27 0x0000000000518d37 pybind11::cpp_function::dispatcher()  :0
28 0x000000000018ab32 PyObject_CallFunctionObjArgs()  ???:0
29 0x000000000019910b PyObject_Call()  ???:0
30 0x000000000017b6ef _PyEval_EvalFrameDefault()  ???:0
31 0x000000000018b38c _PyFunction_Vectorcall()  ???:0
32 0x000000000018061d _PyObject_FastCallDictTstate()  ???:0
33 0x000000000019562c _PyObject_Call_Prepend()  ???:0
34 0x000000000029d464 PyInit__datetime()  ???:0
35 0x000000000018139b _PyObject_MakeTpCall()  ???:0
36 0x000000000017b99e _PyEval_EvalFrameDefault()  ???:0
37 0x000000000018b38c _PyFunction_Vectorcall()  ???:0
38 0x000000000017597f _PyEval_EvalFrameDefault()  ???:0
39 0x000000000018b38c _PyFunction_Vectorcall()  ???:0
40 0x000000000017597f _PyEval_EvalFrameDefault()  ???:0
41 0x000000000018b38c _PyFunction_Vectorcall()  ???:0
42 0x0000000000175790 _PyEval_EvalFrameDefault()  ???:0
43 0x00000000001984d1 PyMethod_New()  ???:0
44 0x000000000017a702 _PyEval_EvalFrameDefault()  ???:0
45 0x000000000019861e PyMethod_New()  ???:0
46 0x0000000000177c30 _PyEval_EvalFrameDefault()  ???:0
47 0x000000000019861e PyMethod_New()  ???:0
48 0x0000000000177c30 _PyEval_EvalFrameDefault()  ???:0
49 0x0000000000180574 _PyObject_FastCallDictTstate()  ???:0
50 0x000000000019562c _PyObject_Call_Prepend()  ???:0
51 0x000000000029d464 PyInit__datetime()  ???:0
52 0x000000000018139b _PyObject_MakeTpCall()  ???:0
53 0x000000000017b009 _PyEval_EvalFrameDefault()  ???:0
54 0x000000000018b38c _PyFunction_Vectorcall()  ???:0
55 0x0000000000177c30 _PyEval_EvalFrameDefault()  ???:0
56 0x00000000001984d1 PyMethod_New()  ???:0
=================================
Fatal Python error: Segmentation fault

Thread 0x00007f0ab78df640 (most recent call first):
  File "/usr/lib/python3.10/threading.py", line 324 in wait
  File "/usr/lib/python3.10/threading.py", line 607 in wait
  File "/usr/local/lib/python3.10/dist-packages/tqdm/_monitor.py", line 60 in run
  File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap

Current thread 0x00007f1218674740 (most recent call first):
  File "/sgl-workspace/sglang/python/sglang/srt/distributed/device_communicators/pynccl_wrapper.py", line 413 in ncclAllGather
  File "/sgl-workspace/sglang/python/sglang/srt/distributed/device_communicators/pynccl.py", line 162 in all_gather
  File "/sgl-workspace/sglang/python/sglang/srt/distributed/parallel_state.py", line 460 in _all_gather_into_tensor
  File "/sgl-workspace/sglang/python/sglang/srt/distributed/parallel_state.py", line 149 in reg_all_gather_into_tensor
  File "/usr/local/lib/python3.10/dist-packages/torch/_ops.py", line 1123 in __call__
  File "/sgl-workspace/sglang/python/sglang/srt/distributed/parallel_state.py", line 470 in all_gather_into_tensor
  File "/sgl-workspace/sglang/python/sglang/srt/distributed/parallel_state.py", line 513 in all_gather
  File "/sgl-workspace/sglang/python/sglang/srt/distributed/communication_op.py", line 20 in tensor_model_parallel_all_gather
  File "/sgl-workspace/sglang/python/sglang/srt/layers/logits_processor.py", line 445 in _get_logits
  File "/sgl-workspace/sglang/python/sglang/srt/layers/logits_processor.py", line 311 in forward
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1750 in _call_impl
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1739 in _wrapped_call_impl
  File "/sgl-workspace/sglang/python/sglang/srt/models/qwen2.py", line 385 in forward
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116 in decorate_context
  File "/sgl-workspace/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 445 in run_once
  File "/sgl-workspace/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 452 in capture_one_batch_size
  File "/sgl-workspace/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 360 in capture
  File "/sgl-workspace/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 276 in __init__
  File "/sgl-workspace/sglang/python/sglang/srt/model_executor/model_runner.py", line 965 in init_cuda_graphs
  File "/sgl-workspace/sglang/python/sglang/srt/model_executor/model_runner.py", line 219 in initialize
  File "/sgl-workspace/sglang/python/sglang/srt/model_executor/model_runner.py", line 181 in __init__
  File "/sgl-workspace/sglang/python/sglang/srt/managers/tp_worker.py", line 75 in __init__
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 261 in __init__
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 2012 in run_scheduler_process
  File "/usr/lib/python3.10/multiprocessing/process.py", line 108 in run
  File "/usr/lib/python3.10/multiprocessing/process.py", line 314 in _bootstrap
  File "/usr/lib/python3.10/multiprocessing/spawn.py", line 129 in _main
  File "/usr/lib/python3.10/multiprocessing/spawn.py", line 116 in spawn_main
  File "<string>", line 1 in <module>

Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, charset_normalizer.md, requests.packages.charset_normalizer.md, requests.packages.chardet.md, multidict._multidict, yarl._quoting_c, propcache._helpers_c, aiohttp._http_writer, aiohttp._http_parser, aiohttp._websocket.mask, aiohttp._websocket.reader_c, frozenlist._frozenlist, uvloop.loop, zmq.backend.cython._zmq, PIL._imaging, torch._C, torch._C._dynamo.autograd_compiler, torch._C._dynamo.eval_frame, torch._C._dynamo.guards, torch._C._dynamo.utils, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, psutil._psutil_linux, psutil._psutil_posix, setproctitle, yaml._yaml, markupsafe._speedups, PIL._imagingft, sklearn.__check_build._check_build, scipy._lib._ccallback_c, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg.cython_lapack, scipy.linalg._cythonized_array_utils, scipy.linalg._solve_toeplitz, scipy.linalg._decomp_lu_cython, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg._matfuncs_expm, scipy.linalg._linalg_pythran, scipy.linalg.cython_blas, scipy.linalg._decomp_update, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.linalg._propack._spropack, scipy.sparse.linalg._propack._dpropack, scipy.sparse.linalg._propack._cpropack, scipy.sparse.linalg._propack._zpropack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, scipy.special._ufuncs_cxx, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ellip_harm_2, scipy.spatial._ckdtree, scipy._lib.messagestream, scipy.spatial._qhull, scipy.spatial._voronoi, scipy.spatial._distance_wrap, scipy.spatial._hausdorff, scipy.spatial.transform._rotation, scipy.optimize._group_columns, scipy.optimize._trlib._trlib, scipy.optimize._lbfgsb, _moduleTNC, scipy.optimize._moduleTNC, scipy.optimize._cobyla, scipy.optimize._slsqp, scipy.optimize._minpack, scipy.optimize._lsq.givens_elimination, scipy.optimize._zeros, scipy.optimize._cython_nnls, scipy._lib._uarray._uarray, scipy.linalg._decomp_interpolative, scipy.optimize._bglu_dense, scipy.optimize._lsap, scipy.optimize._direct, scipy.integrate._odepack, scipy.integrate._quadpack, scipy.integrate._vode, scipy.integrate._dop, scipy.integrate._lsoda, scipy.interpolate._fitpack, scipy.interpolate._dfitpack, scipy.interpolate._dierckx, scipy.interpolate._ppoly, scipy.interpolate._interpnd, scipy.interpolate._rbfinterp_pythran, scipy.interpolate._rgi_cython, scipy.interpolate._bspl, scipy.special.cython_special, scipy.stats._stats, scipy.stats._sobol, scipy.stats._qmc_cy, scipy.stats._biasedurn, scipy.stats._stats_pythran, scipy.stats._levy_stable.levyst, scipy.stats._ansari_swilk_statistics, scipy.stats._mvn, scipy.stats._rcont.rcont, scipy.ndimage._nd_image, scipy.ndimage._rank_filter_1d, _ni_label, scipy.ndimage._ni_label, pyarrow.lib, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.strptime, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.lib, pyarrow._compute, pandas._libs.ops, pandas._libs.hashing, pandas._libs.arrays, pandas._libs.tslib, pandas._libs.sparse, pandas._libs.internals, pandas._libs.indexing, pandas._libs.index, pandas._libs.writers, pandas._libs.join, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.json, pandas._libs.parsers, pandas._libs.testing, sklearn.utils._isfinite, sklearn.utils.sparsefuncs_fast, sklearn.utils.murmurhash, sklearn.utils._openmp_helpers, sklearn.metrics.cluster._expected_mutual_info_fast, sklearn.preprocessing._csr_polynomial_expansion, sklearn.preprocessing._target_encoder_fast, sklearn.metrics._dist_metrics, sklearn.metrics._pairwise_distances_reduction._datasets_pair, sklearn.utils._cython_blas, sklearn.metrics._pairwise_distances_reduction._base, sklearn.metrics._pairwise_distances_reduction._middle_term_computer, sklearn.utils._heap, sklearn.utils._sorting, sklearn.metrics._pairwise_distances_reduction._argkmin, sklearn.metrics._pairwise_distances_reduction._argkmin_classmode, sklearn.utils._vector_sentinel, sklearn.metrics._pairwise_distances_reduction._radius_neighbors, sklearn.metrics._pairwise_distances_reduction._radius_neighbors_classmode, sklearn.metrics._pairwise_fast, sentencepiece._sentencepiece, msgspec._core, _cffi_backend, msgpack._cmsgpack, google._upb._message, ray._raylet, cuda_utils (total: 197)
==== backtrace (tid: 293196) ====
 0 0x0000000000042520 __sigaction()  ???:0
 1 0x000000000005bb28 ncclGroupCommJoin()  /dvs/p4/build/sw/gpgpu/nccl/gitfusion/stable/src/include/group.h:113
 2 0x000000000005bb28 taskAppend()  /dvs/p4/build/sw/gpgpu/nccl/gitfusion/stable/src/enqueue.cc:2152
 3 0x000000000005bb28 ncclEnqueueCheck()  /dvs/p4/build/sw/gpgpu/nccl/gitfusion/stable/src/enqueue.cc:2224
 4 0x000000000004e991 ncclAllGather()  /dvs/p4/build/sw/gpgpu/nccl/gitfusion/stable/src/collectives.cc:88
 5 0x0000000000007e2e ffi_prep_go_closure()  ???:0
 6 0x0000000000004493 ???()  /lib/x86_64-linux-gnu/libffi.so.8:0
 7 0x000000000000a3e9 ???()  /usr/lib/python3.10/lib-dynload/_ctypes.cpython-310-x86_64-linux-gnu.so:0
 8 0x0000000000013302 ???()  /usr/lib/python3.10/lib-dynload/_ctypes.cpython-310-x86_64-linux-gnu.so:0
 9 0x000000000018139b _PyObject_MakeTpCall()  ???:0
10 0x000000000017aa97 _PyEval_EvalFrameDefault()  ???:0
11 0x000000000018b38c _PyFunction_Vectorcall()  ???:0
12 0x000000000017597f _PyEval_EvalFrameDefault()  ???:0
13 0x000000000018b38c _PyFunction_Vectorcall()  ???:0
14 0x000000000017597f _PyEval_EvalFrameDefault()  ???:0
15 0x000000000018b38c _PyFunction_Vectorcall()  ???:0
16 0x000000000017597f _PyEval_EvalFrameDefault()  ???:0
17 0x000000000018b38c _PyFunction_Vectorcall()  ???:0
18 0x0000000000a6d3a7 pybind11::detail::object_api<pybind11::handle>::operator()<(pybind11::return_value_policy)1, pybind11::detail::args_proxy, pybind11::detail::kwargs_proxy>()  :0
19 0x0000000000d96640 torch::impl::dispatch::PythonKernelHolder::operator()()  :0
20 0x00000000058bc27b c10::OperatorHandle::redispatchBoxed()  :0
21 0x00000000058b9af9 torch::autograd::basicAutogradNotImplementedFallbackImpl()  autograd_not_implemented_fallback.cpp:0
22 0x0000000001aca9f8 c10::BoxedKernel::make_boxed_function<&(anonymous namespace)::autograd_fallback>()  VariableFallbackKernel.cpp:0
23 0x0000000000da1457 c10::Dispatcher::callBoxed()  ???:0
24 0x0000000000b2c2e6 torch::jit::invokeOperatorFromPython()  ???:0
25 0x0000000000b2c647 torch::jit::_get_operation_for_overload_or_packet()  ???:0
26 0x0000000000a1b592 pybind11::cpp_function::initialize<torch::jit::initJITBindings(_object*)::{lambda(std::string const&)#217}::operator()(std::string const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}, pybind11::object, pybind11::args const&, pybind11::kwargs const&, pybind11::name, pybind11::doc>(torch::jit::initJITBindings(_object*)::{lambda(std::string const&)#217}::operator()(std::string const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}&&, pybind11::object (*)(pybind11::args const&, pybind11::kwargs const&), pybind11::name const&, pybind11::doc const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN()  init.cpp:0
27 0x0000000000518d37 pybind11::cpp_function::dispatcher()  :0
28 0x000000000018ab32 PyObject_CallFunctionObjArgs()  ???:0
29 0x000000000019910b PyObject_Call()  ???:0
30 0x000000000017b6ef _PyEval_EvalFrameDefault()  ???:0
31 0x000000000018b38c _PyFunction_Vectorcall()  ???:0
32 0x000000000018061d _PyObject_FastCallDictTstate()  ???:0
33 0x000000000019562c _PyObject_Call_Prepend()  ???:0
34 0x000000000029d464 PyInit__datetime()  ???:0
35 0x000000000018139b _PyObject_MakeTpCall()  ???:0
36 0x000000000017b99e _PyEval_EvalFrameDefault()  ???:0
37 0x000000000018b38c _PyFunction_Vectorcall()  ???:0
38 0x000000000017597f _PyEval_EvalFrameDefault()  ???:0
39 0x000000000018b38c _PyFunction_Vectorcall()  ???:0
40 0x000000000017597f _PyEval_EvalFrameDefault()  ???:0
41 0x000000000018b38c _PyFunction_Vectorcall()  ???:0
42 0x0000000000175790 _PyEval_EvalFrameDefault()  ???:0
43 0x00000000001984d1 PyMethod_New()  ???:0
44 0x000000000017a702 _PyEval_EvalFrameDefault()  ???:0
45 0x000000000019861e PyMethod_New()  ???:0
46 0x0000000000177c30 _PyEval_EvalFrameDefault()  ???:0
47 0x000000000019861e PyMethod_New()  ???:0
48 0x0000000000177c30 _PyEval_EvalFrameDefault()  ???:0
49 0x0000000000180574 _PyObject_FastCallDictTstate()  ???:0
50 0x000000000019562c _PyObject_Call_Prepend()  ???:0
51 0x000000000029d464 PyInit__datetime()  ???:0
52 0x000000000018139b _PyObject_MakeTpCall()  ???:0
53 0x000000000017b009 _PyEval_EvalFrameDefault()  ???:0
54 0x000000000018b38c _PyFunction_Vectorcall()  ???:0
55 0x0000000000177c30 _PyEval_EvalFrameDefault()  ???:0
56 0x00000000001984d1 PyMethod_New()  ???:0
=================================
Fatal Python error: Segmentation fault

Thread 0x00007f7dc08e1640 (most recent call first):
  File "/usr/lib/python3.10/threading.py", line 324 in wait
  File "/usr/lib/python3.10/threading.py", line 607 in wait
  File "/usr/local/lib/python3.10/dist-packages/tqdm/_monitor.py", line 60 in run
  File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap

Current thread 0x00007f84c807c740 (most recent call first):
  File "/sgl-workspace/sglang/python/sglang/srt/distributed/device_communicators/pynccl_wrapper.py", line 413 in ncclAllGather
  File "/sgl-workspace/sglang/python/sglang/srt/distributed/device_communicators/pynccl.py", line 162 in all_gather
  File "/sgl-workspace/sglang/python/sglang/srt/distributed/parallel_state.py", line 460 in _all_gather_into_tensor
  File "/sgl-workspace/sglang/python/sglang/srt/distributed/parallel_state.py", line 149 in reg_all_gather_into_tensor
  File "/usr/local/lib/python3.10/dist-packages/torch/_ops.py", line 1123 in __call__
  File "/sgl-workspace/sglang/python/sglang/srt/distributed/parallel_state.py", line 470 in all_gather_into_tensor
  File "/sgl-workspace/sglang/python/sglang/srt/distributed/parallel_state.py", line 513 in all_gather
  File "/sgl-workspace/sglang/python/sglang/srt/distributed/communication_op.py", line 20 in tensor_model_parallel_all_gather
  File "/sgl-workspace/sglang/python/sglang/srt/layers/logits_processor.py", line 445 in _get_logits
  File "/sgl-workspace/sglang/python/sglang/srt/layers/logits_processor.py", line 311 in forward
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1750 in _call_impl
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1739 in _wrapped_call_impl
  File "/sgl-workspace/sglang/python/sglang/srt/models/qwen2.py", line 385 in forward
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116 in decorate_context
  File "/sgl-workspace/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 445 in run_once
  File "/sgl-workspace/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 452 in capture_one_batch_size
  File "/sgl-workspace/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 360 in capture
  File "/sgl-workspace/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 276 in __init__
  File "/sgl-workspace/sglang/python/sglang/srt/model_executor/model_runner.py", line 965 in init_cuda_graphs
  File "/sgl-workspace/sglang/python/sglang/srt/model_executor/model_runner.py", line 219 in initialize
  File "/sgl-workspace/sglang/python/sglang/srt/model_executor/model_runner.py", line 181 in __init__
  File "/sgl-workspace/sglang/python/sglang/srt/managers/tp_worker.py", line 75 in __init__
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 261 in __init__
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 2012 in run_scheduler_process
  File "/usr/lib/python3.10/multiprocessing/process.py", line 108 in run
  File "/usr/lib/python3.10/multiprocessing/process.py", line 314 in _bootstrap
  File "/usr/lib/python3.10/multiprocessing/spawn.py", line 129 in _main
  File "/usr/lib/python3.10/multiprocessing/spawn.py", line 116 in spawn_main
  File "<string>", line 1 in <module>

Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, charset_normalizer.md, requests.packages.charset_normalizer.md, requests.packages.chardet.md, multidict._multidict, yarl._quoting_c, propcache._helpers_c, aiohttp._http_writer, aiohttp._http_parser, aiohttp._websocket.mask, aiohttp._websocket.reader_c, frozenlist._frozenlist, uvloop.loop, zmq.backend.cython._zmq, PIL._imaging, torch._C, torch._C._dynamo.autograd_compiler, torch._C._dynamo.eval_frame, torch._C._dynamo.guards, torch._C._dynamo.utils, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, psutil._psutil_linux, psutil._psutil_posix, setproctitle, yaml._yaml, markupsafe._speedups, PIL._imagingft, sklearn.__check_build._check_build, scipy._lib._ccallback_c, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg.cython_lapack, scipy.linalg._cythonized_array_utils, scipy.linalg._solve_toeplitz, scipy.linalg._decomp_lu_cython, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg._matfuncs_expm, scipy.linalg._linalg_pythran, scipy.linalg.cython_blas, scipy.linalg._decomp_update, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.linalg._propack._spropack, scipy.sparse.linalg._propack._dpropack, scipy.sparse.linalg._propack._cpropack, scipy.sparse.linalg._propack._zpropack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, scipy.special._ufuncs_cxx, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ellip_harm_2, scipy.spatial._ckdtree, scipy._lib.messagestream, scipy.spatial._qhull, scipy.spatial._voronoi, scipy.spatial._distance_wrap, scipy.spatial._hausdorff, scipy.spatial.transform._rotation, scipy.optimize._group_columns, scipy.optimize._trlib._trlib, scipy.optimize._lbfgsb, _moduleTNC, scipy.optimize._moduleTNC, scipy.optimize._cobyla, scipy.optimize._slsqp, scipy.optimize._minpack, scipy.optimize._lsq.givens_elimination, scipy.optimize._zeros, scipy.optimize._cython_nnls, scipy._lib._uarray._uarray, scipy.linalg._decomp_interpolative, scipy.optimize._bglu_dense, scipy.optimize._lsap, scipy.optimize._direct, scipy.integrate._odepack, scipy.integrate._quadpack, scipy.integrate._vode, scipy.integrate._dop, scipy.integrate._lsoda, scipy.interpolate._fitpack, scipy.interpolate._dfitpack, scipy.interpolate._dierckx, scipy.interpolate._ppoly, scipy.interpolate._interpnd, scipy.interpolate._rbfinterp_pythran, scipy.interpolate._rgi_cython, scipy.interpolate._bspl, scipy.special.cython_special, scipy.stats._stats, scipy.stats._sobol, scipy.stats._qmc_cy, scipy.stats._biasedurn, scipy.stats._stats_pythran, scipy.stats._levy_stable.levyst, scipy.stats._ansari_swilk_statistics, scipy.stats._mvn, scipy.stats._rcont.rcont, scipy.ndimage._nd_image, scipy.ndimage._rank_filter_1d, _ni_label, scipy.ndimage._ni_label, pyarrow.lib, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.strptime, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.lib, pyarrow._compute, pandas._libs.ops, pandas._libs.hashing, pandas._libs.arrays, pandas._libs.tslib, pandas._libs.sparse, pandas._libs.internals, pandas._libs.indexing, pandas._libs.index, pandas._libs.writers, pandas._libs.join, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.json, pandas._libs.parsers, pandas._libs.testing, sklearn.utils._isfinite, sklearn.utils.sparsefuncs_fast, sklearn.utils.murmurhash, sklearn.utils._openmp_helpers, sklearn.metrics.cluster._expected_mutual_info_fast, sklearn.preprocessing._csr_polynomial_expansion, sklearn.preprocessing._target_encoder_fast, sklearn.metrics._dist_metrics, sklearn.metrics._pairwise_distances_reduction._datasets_pair, sklearn.utils._cython_blas, sklearn.metrics._pairwise_distances_reduction._base, sklearn.metrics._pairwise_distances_reduction._middle_term_computer, sklearn.utils._heap, sklearn.utils._sorting, sklearn.metrics._pairwise_distances_reduction._argkmin, sklearn.metrics._pairwise_distances_reduction._argkmin_classmode, sklearn.utils._vector_sentinel, sklearn.metrics._pairwise_distances_reduction._radius_neighbors, sklearn.metrics._pairwise_distances_reduction._radius_neighbors_classmode, sklearn.metrics._pairwise_fast, sentencepiece._sentencepiece, msgspec._core, _cffi_backend, msgpack._cmsgpack, google._upb._message, ray._raylet, cuda_utils (total: 197)
[2025-04-26 20:55:31] Rank 0 scheduler is dead. Please check if there are relevant logs.
[2025-04-26 20:55:32] Child process unexpectedly failed with an exit code 11. pid=293198
[2025-04-26 20:55:32] Child process unexpectedly failed with an exit code 11. pid=293197
[2025-04-26 20:55:32] Child process unexpectedly failed with an exit code 11. pid=293196
[2025-04-26 20:55:32] Exit code: -11
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/sgl-workspace/sglang/python/sglang/launch_server.py", line 14, in <module>
    launch_server(server_args)
  File "/sgl-workspace/sglang/python/sglang/srt/entrypoints/http_server.py", line 700, in launch_server
    tokenizer_manager, scheduler_info = _launch_subprocesses(server_args=server_args)
  File "/sgl-workspace/sglang/python/sglang/srt/entrypoints/engine.py", line 586, in _launch_subprocesses
    data = scheduler_pipe_readers[i].recv()
  File "/usr/lib/python3.10/multiprocessing/connection.py", line 250, in recv
    buf = self._recv_bytes()
  File "/usr/lib/python3.10/multiprocessing/connection.py", line 414, in _recv_bytes
    buf = self._recv(4)
  File "/usr/lib/python3.10/multiprocessing/connection.py", line 383, in _recv
    raise EOFError
EOFError

Reproduction

2025-04-26 20:54:53,631 - pdutils - INFO - runCommand remotely: ssh -o StrictHostKeyChecking=no  ytn0 "PS1=[] source ~/.bashrc  && env && ( CUDA_VISIBLE_DEVICES=0,1,2,3 UCX_TLS=rc,gdr_copy,rc_x,cuda_copy,cuda_ipc UCX_NET_DEVICES=mlx5_bond_1:1,mlx5_bond_2:1,mlx5_bond_3:1,mlx5_bond_4:1,mlx5_bond_5:1,mlx5_bond_6:1,mlx5_bond_7:1,mlx5_bond_8:1 UCX_LOG_LEVEL=info NCCL_DEBUG=WARN SGLANG_PD_NIXL_DEBUG_TRANSFER_TIME=1 SGL_ENABLE_JIT_DEEPGEMM=0 python3.10 -m sglang.launch_server --host 0.0.0.0 --nnodes 1 --node-rank 0 --dist-init-addr ytn0:7010 --model-path /home/qspace/upload/luban_cache/model/luban-llm_deepseek_r1_distill_qwen_1_5b-model_path/DeepSeek-R1-Distill-Qwen-1.5B --trust-remote-code --disable-radix-cache --schedule-policy fcfs --mem-fraction-static 0.70 --disable-overlap-schedule --chunked-prefill-size 32768 --allow-auto-truncate --tp 4 --log-level debug --enable-metrics --page-size 64 --disaggregation-mode prefill --disaggregation-transfer-backend nixl --disaggregation-bootstrap-port 7100 --max-running-requests 32 --port 8080 )"
2025-04-26 20:54:53,632 - pdutils - INFO - runCommand remotely: ssh -o StrictHostKeyChecking=no  ytn0 "PS1=[] source ~/.bashrc  && env && ( CUDA_VISIBLE_DEVICES=4,5,6,7 UCX_TLS=rc,gdr_copy,rc_x,cuda_copy,cuda_ipc UCX_NET_DEVICES=mlx5_bond_1:1,mlx5_bond_2:1,mlx5_bond_3:1,mlx5_bond_4:1,mlx5_bond_5:1,mlx5_bond_6:1,mlx5_bond_7:1,mlx5_bond_8:1 UCX_LOG_LEVEL=info NCCL_DEBUG=WARN SGLANG_PD_NIXL_DEBUG_TRANSFER_TIME=1 SGL_ENABLE_JIT_DEEPGEMM=0 python3.10 -m sglang.launch_server --host 0.0.0.0 --nnodes 1 --node-rank 0 --dist-init-addr ytn0:7020 --model-path /home/qspace/upload/luban_cache/model/luban-llm_deepseek_r1_distill_qwen_1_5b-model_path/DeepSeek-R1-Distill-Qwen-1.5B --trust-remote-code --disable-radix-cache --schedule-policy fcfs --mem-fraction-static 0.70 --disable-overlap-schedule --chunked-prefill-size 32768 --allow-auto-truncate --tp 4 --log-level debug --enable-metrics --page-size 64 --disaggregation-mode decode --disaggregation-transfer-backend nixl --disaggregation-bootstrap-port 7100 --max-running-requests 32 --port 9080 )"

Environment

nccl 2.25.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions