[rank2]:[E319 19:55:16.862637187 ProcessGroupNCCL.cpp:2093] [PG ID 2 PG GUID 3 Rank 2] Process group watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
Search for `cudaErrorIllegalAddress' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Exception raised from query at /pytorch/aten/src/ATen/cuda/CUDAEvent.h:108 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x9d (0x7fdb36372fdd in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0xc0e0 (0x7fdb367780e0 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x50 (0x7fda7b3008f0 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #3: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0x68 (0x7fda7b30da68 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #4: c10d::ProcessGroupNCCL::Watchdog::runLoop() + 0x949 (0x7fda7b311539 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #5: c10d::ProcessGroupNCCL::Watchdog::run() + 0x105 (0x7fda7b3135d5 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #6: <unknown function> + 0xe77e4 (0x7fdb064e77e4 in /lib64/libstdc++.so.6)
frame #7: <unknown function> + 0x8b2ea (0x7fdb4448b2ea in /lib64/libc.so.6)
frame #8: <unknown function> + 0x110b40 (0x7fdb44510b40 in /lib64/libc.so.6)
terminate called after throwing an instance of 'c10::DistBackendError'
what(): [PG ID 2 PG GUID 3 Rank 2] Process group watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
Search for `cudaErrorIllegalAddress' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Exception raised from query at /pytorch/aten/src/ATen/cuda/CUDAEvent.h:108 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x9d (0x7fdb36372fdd in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0xc0e0 (0x7fdb367780e0 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x50 (0x7fda7b3008f0 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #3: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0x68 (0x7fda7b30da68 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #4: c10d::ProcessGroupNCCL::Watchdog::runLoop() + 0x949 (0x7fda7b311539 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #5: c10d::ProcessGroupNCCL::Watchdog::run() + 0x105 (0x7fda7b3135d5 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #6: <unknown function> + 0xe77e4 (0x7fdb064e77e4 in /lib64/libstdc++.so.6)
frame #7: <unknown function> + 0x8b2ea (0x7fdb4448b2ea in /lib64/libc.so.6)
frame #8: <unknown function> + 0x110b40 (0x7fdb44510b40 in /lib64/libc.so.6)
Exception raised from run at /pytorch/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:2099 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x9d (0x7fdb36372fdd in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x68c348 (0x7fda7aa8c348 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #2: <unknown function> + 0xe77e4 (0x7fdb064e77e4 in /lib64/libstdc++.so.6)
frame #3: <unknown function> + 0x8b2ea (0x7fdb4448b2ea in /lib64/libc.so.6)
frame #4: <unknown function> + 0x110b40 (0x7fdb44510b40 in /lib64/libc.so.6)
(Worker pid=2513317) (Worker_TP0 pid=2513317) Exception in thread WorkerAsyncOutputCopy:
(Worker pid=2513317) (Worker_TP0 pid=2513317) Traceback (most recent call last):
(Worker pid=2513317) (Worker_TP0 pid=2513317) File "/home/ssm-user/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/threading.py", line 1075, in _bootstrap_inner
(Worker pid=2513317) (Worker_TP0 pid=2513317) self.run()
(Worker pid=2513317) (Worker_TP0 pid=2513317) File "/home/ssm-user/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/threading.py", line 1012, in run
(Worker pid=2513317) (Worker_TP0 pid=2513317) self._target(*self._args, **self._kwargs)
(Worker pid=2513317) (Worker_TP0 pid=2513317) File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 860, in async_output_busy_loop
(Worker pid=2513317) (Worker_TP0 pid=2513317) self.enqueue_output(output)
(Worker pid=2513317) (Worker_TP0 pid=2513317) File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 837, in enqueue_output
(Worker pid=2513317) (Worker_TP0 pid=2513317) output = output.get_output()
(Worker pid=2513317) (Worker_TP0 pid=2513317) ^^^^^^^^^^^^^^^^^^^
(Worker pid=2513317) (Worker_TP0 pid=2513317) File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 251, in get_output
(Worker pid=2513317) (Worker_TP0 pid=2513317) self.async_copy_ready_event.synchronize()
(Worker pid=2513317) (Worker_TP0 pid=2513317) torch.AcceleratorError: CUDA error: an illegal memory access was encountered
(Worker pid=2513317) (Worker_TP0 pid=2513317) Search for `cudaErrorIllegalAddress' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
(Worker pid=2513317) (Worker_TP0 pid=2513317) CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
(Worker pid=2513317) (Worker_TP0 pid=2513317) For debugging consider passing CUDA_LAUNCH_BLOCKING=1
(Worker pid=2513317) (Worker_TP0 pid=2513317) Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
(Worker pid=2513317) (Worker_TP0 pid=2513317)
[rank0]:[W319 19:55:16.871470664 CUDAGuardImpl.h:122] Warning: CUDA warning: an illegal memory access was encountered (function destroyEvent)
terminate called after throwing an instance of 'c10::AcceleratorError'
what(): CUDA error: an illegal memory access was encountered
Search for `cudaErrorIllegalAddress' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Exception raised from currentStreamCaptureStatusMayInitCtx at /pytorch/c10/cuda/CUDAGraphsC10Utils.h:71 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x9d (0x7fe47c8defdd in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0xc0e0 (0x7fe47c9780e0 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libc10_cuda.so)
frame #2: <unknown function> + 0xce4b3a (0x7fe3c16e4b3a in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #3: <unknown function> + 0x7e9d4 (0x7fe47c8c09d4 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libc10.so)
frame #4: c10::TensorImpl::~TensorImpl() + 0x9 (0x7fe47c8ba369 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libc10.so)
frame #5: <unknown function> + 0x862f65 (0x7fe3f4e62f65 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_python.so)
frame #6: <unknown function> + 0x863001 (0x7fe3f4e63001 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_python.so)
frame #7: VLLM::Worker_TP0() [0x1635231]
frame #8: VLLM::Worker_TP0() [0x163537b]
frame #9: VLLM::Worker_TP0() [0x1633f63]
frame #10: VLLM::Worker_TP0() [0x1633ab3]
frame #11: VLLM::Worker_TP0() [0x1633ad6]
frame #12: VLLM::Worker_TP0() [0x1633ad6]
frame #13: VLLM::Worker_TP0() [0x1633c27]
frame #14: VLLM::Worker_TP0() [0x1635231]
frame #15: _PyEval_EvalFrameDefault + 0xe5ec (0x16200ec in VLLM::Worker_TP0)
frame #16: VLLM::Worker_TP0() [0x161122d]
frame #17: VLLM::Worker_TP0() [0x1740925]
frame #18: VLLM::Worker_TP0() [0x1740861]
frame #19: <unknown function> + 0x8b2ea (0x7fe48a88b2ea in /lib64/libc.so.6)
frame #20: <unknown function> + 0x110b40 (0x7fe48a910b40 in /lib64/libc.so.6)
[rank7]:[E319 19:55:16.880866178 ProcessGroupNCCL.cpp:2093] [PG ID 2 PG GUID 3 Rank 7] Process group watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
Search for `cudaErrorIllegalAddress' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Exception raised from query at /pytorch/aten/src/ATen/cuda/CUDAEvent.h:108 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x9d (0x7f5bbb572fdd in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0xc0e0 (0x7f5bbef330e0 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x50 (0x7f5b03f008f0 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #3: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0x68 (0x7f5b03f0da68 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #4: c10d::ProcessGroupNCCL::Watchdog::runLoop() + 0x949 (0x7f5b03f11539 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #5: c10d::ProcessGroupNCCL::Watchdog::run() + 0x105 (0x7f5b03f135d5 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #6: <unknown function> + 0xe77e4 (0x7f5b8f0e77e4 in /lib64/libstdc++.so.6)
frame #7: <unknown function> + 0x8b2ea (0x7f5bcce8b2ea in /lib64/libc.so.6)
frame #8: <unknown function> + 0x110b40 (0x7f5bccf10b40 in /lib64/libc.so.6)
terminate called after throwing an instance of 'c10::DistBackendError'
what(): [PG ID 2 PG GUID 3 Rank 7] Process group watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
Search for `cudaErrorIllegalAddress' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Exception raised from query at /pytorch/aten/src/ATen/cuda/CUDAEvent.h:108 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x9d (0x7f5bbb572fdd in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0xc0e0 (0x7f5bbef330e0 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x50 (0x7f5b03f008f0 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #3: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0x68 (0x7f5b03f0da68 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #4: c10d::ProcessGroupNCCL::Watchdog::runLoop() + 0x949 (0x7f5b03f11539 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #5: c10d::ProcessGroupNCCL::Watchdog::run() + 0x105 (0x7f5b03f135d5 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #6: <unknown function> + 0xe77e4 (0x7f5b8f0e77e4 in /lib64/libstdc++.so.6)
frame #7: <unknown function> + 0x8b2ea (0x7f5bcce8b2ea in /lib64/libc.so.6)
frame #8: <unknown function> + 0x110b40 (0x7f5bccf10b40 in /lib64/libc.so.6)
Exception raised from run at /pytorch/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:2099 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x9d (0x7f5bbb572fdd in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x68c348 (0x7f5b0368c348 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #2: <unknown function> + 0xe77e4 (0x7f5b8f0e77e4 in /lib64/libstdc++.so.6)
frame #3: <unknown function> + 0x8b2ea (0x7f5bcce8b2ea in /lib64/libc.so.6)
frame #4: <unknown function> + 0x110b40 (0x7f5bccf10b40 in /lib64/libc.so.6)
[rank5]:[E319 19:55:16.897002377 ProcessGroupNCCL.cpp:2093] [PG ID 2 PG GUID 3 Rank 5] Process group watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
Search for `cudaErrorIllegalAddress' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Exception raised from query at /pytorch/aten/src/ATen/cuda/CUDAEvent.h:108 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x9d (0x7f88880defdd in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0xc0e0 (0x7f88881780e0 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x50 (0x7f87cd1008f0 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #3: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0x68 (0x7f87cd10da68 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #4: c10d::ProcessGroupNCCL::Watchdog::runLoop() + 0x949 (0x7f87cd111539 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #5: c10d::ProcessGroupNCCL::Watchdog::run() + 0x105 (0x7f87cd1135d5 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #6: <unknown function> + 0xe77e4 (0x7f88582e77e4 in /lib64/libstdc++.so.6)
frame #7: <unknown function> + 0x8b2ea (0x7f889608b2ea in /lib64/libc.so.6)
frame #8: <unknown function> + 0x110b40 (0x7f8896110b40 in /lib64/libc.so.6)
terminate called after throwing an instance of 'c10::DistBackendError'
what(): [PG ID 2 PG GUID 3 Rank 5] Process group watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
Search for `cudaErrorIllegalAddress' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Exception raised from query at /pytorch/aten/src/ATen/cuda/CUDAEvent.h:108 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x9d (0x7f88880defdd in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0xc0e0 (0x7f88881780e0 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x50 (0x7f87cd1008f0 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #3: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0x68 (0x7f87cd10da68 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #4: c10d::ProcessGroupNCCL::Watchdog::runLoop() + 0x949 (0x7f87cd111539 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #5: c10d::ProcessGroupNCCL::Watchdog::run() + 0x105 (0x7f87cd1135d5 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #6: <unknown function> + 0xe77e4 (0x7f88582e77e4 in /lib64/libstdc++.so.6)
frame #7: <unknown function> + 0x8b2ea (0x7f889608b2ea in /lib64/libc.so.6)
frame #8: <unknown function> + 0x110b40 (0x7f8896110b40 in /lib64/libc.so.6)
Exception raised from run at /pytorch/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:2099 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x9d (0x7f88880defdd in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x68c348 (0x7f87cc88c348 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #2: <unknown function> + 0xe77e4 (0x7f88582e77e4 in /lib64/libstdc++.so.6)
frame #3: <unknown function> + 0x8b2ea (0x7f889608b2ea in /lib64/libc.so.6)
frame #4: <unknown function> + 0x110b40 (0x7f8896110b40 in /lib64/libc.so.6)
[rank4]:[E319 19:55:16.899273792 ProcessGroupNCCL.cpp:2093] [PG ID 2 PG GUID 3 Rank 4] Process group watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
Search for `cudaErrorIllegalAddress' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Exception raised from query at /pytorch/aten/src/ATen/cuda/CUDAEvent.h:108 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x9d (0x7f196e0defdd in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0xc0e0 (0x7f196e1780e0 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x50 (0x7f18b31008f0 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #3: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0x68 (0x7f18b310da68 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #4: c10d::ProcessGroupNCCL::Watchdog::runLoop() + 0x949 (0x7f18b3111539 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #5: c10d::ProcessGroupNCCL::Watchdog::run() + 0x105 (0x7f18b31135d5 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #6: <unknown function> + 0xe77e4 (0x7f193e2e77e4 in /lib64/libstdc++.so.6)
frame #7: <unknown function> + 0x8b2ea (0x7f197c08b2ea in /lib64/libc.so.6)
frame #8: <unknown function> + 0x110b40 (0x7f197c110b40 in /lib64/libc.so.6)
terminate called after throwing an instance of 'c10::DistBackendError'
what(): [PG ID 2 PG GUID 3 Rank 4] Process group watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
Search for `cudaErrorIllegalAddress' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Exception raised from query at /pytorch/aten/src/ATen/cuda/CUDAEvent.h:108 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x9d (0x7f196e0defdd in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0xc0e0 (0x7f196e1780e0 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x50 (0x7f18b31008f0 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #3: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0x68 (0x7f18b310da68 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #4: c10d::ProcessGroupNCCL::Watchdog::runLoop() + 0x949 (0x7f18b3111539 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #5: c10d::ProcessGroupNCCL::Watchdog::run() + 0x105 (0x7f18b31135d5 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #6: <unknown function> + 0xe77e4 (0x7f193e2e77e4 in /lib64/libstdc++.so.6)
frame #7: <unknown function> + 0x8b2ea (0x7f197c08b2ea in /lib64/libc.so.6)
frame #8: <unknown function> + 0x110b40 (0x7f197c110b40 in /lib64/libc.so.6)
Exception raised from run at /pytorch/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:2099 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x9d (0x7f196e0defdd in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x68c348 (0x7f18b288c348 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #2: <unknown function> + 0xe77e4 (0x7f193e2e77e4 in /lib64/libstdc++.so.6)
frame #3: <unknown function> + 0x8b2ea (0x7f197c08b2ea in /lib64/libc.so.6)
frame #4: <unknown function> + 0x110b40 (0x7f197c110b40 in /lib64/libc.so.6)
[rank6]:[E319 19:55:16.904900358 ProcessGroupNCCL.cpp:2093] [PG ID 2 PG GUID 3 Rank 6] Process group watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
Search for `cudaErrorIllegalAddress' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Exception raised from query at /pytorch/aten/src/ATen/cuda/CUDAEvent.h:108 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x9d (0x7fbf066defdd in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0xc0e0 (0x7fbf067780e0 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x50 (0x7fbe4b7008f0 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #3: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0x68 (0x7fbe4b70da68 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #4: c10d::ProcessGroupNCCL::Watchdog::runLoop() + 0x949 (0x7fbe4b711539 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #5: c10d::ProcessGroupNCCL::Watchdog::run() + 0x105 (0x7fbe4b7135d5 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #6: <unknown function> + 0xe77e4 (0x7fbed68e77e4 in /lib64/libstdc++.so.6)
frame #7: <unknown function> + 0x8b2ea (0x7fbf1468b2ea in /lib64/libc.so.6)
frame #8: <unknown function> + 0x110b40 (0x7fbf14710b40 in /lib64/libc.so.6)
terminate called after throwing an instance of 'c10::DistBackendError'
what(): [PG ID 2 PG GUID 3 Rank 6] Process group watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
Search for `cudaErrorIllegalAddress' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Exception raised from query at /pytorch/aten/src/ATen/cuda/CUDAEvent.h:108 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x9d (0x7fbf066defdd in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0xc0e0 (0x7fbf067780e0 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x50 (0x7fbe4b7008f0 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #3: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0x68 (0x7fbe4b70da68 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #4: c10d::ProcessGroupNCCL::Watchdog::runLoop() + 0x949 (0x7fbe4b711539 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #5: c10d::ProcessGroupNCCL::Watchdog::run() + 0x105 (0x7fbe4b7135d5 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #6: <unknown function> + 0xe77e4 (0x7fbed68e77e4 in /lib64/libstdc++.so.6)
frame #7: <unknown function> + 0x8b2ea (0x7fbf1468b2ea in /lib64/libc.so.6)
frame #8: <unknown function> + 0x110b40 (0x7fbf14710b40 in /lib64/libc.so.6)
Exception raised from run at /pytorch/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:2099 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x9d (0x7fbf066defdd in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x68c348 (0x7fbe4ae8c348 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #2: <unknown function> + 0xe77e4 (0x7fbed68e77e4 in /lib64/libstdc++.so.6)
frame #3: <unknown function> + 0x8b2ea (0x7fbf1468b2ea in /lib64/libc.so.6)
frame #4: <unknown function> + 0x110b40 (0x7fbf14710b40 in /lib64/libc.so.6)
[rank1]:[E319 19:55:16.907649710 ProcessGroupNCCL.cpp:2093] [PG ID 2 PG GUID 3 Rank 1] Process group watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
Search for `cudaErrorIllegalAddress' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Exception raised from query at /pytorch/aten/src/ATen/cuda/CUDAEvent.h:108 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x9d (0x7f5a900defdd in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0xc0e0 (0x7f5a901780e0 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x50 (0x7f59d51008f0 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #3: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0x68 (0x7f59d510da68 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #4: c10d::ProcessGroupNCCL::Watchdog::runLoop() + 0x949 (0x7f59d5111539 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #5: c10d::ProcessGroupNCCL::Watchdog::run() + 0x105 (0x7f59d51135d5 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #6: <unknown function> + 0xe77e4 (0x7f5a602e77e4 in /lib64/libstdc++.so.6)
frame #7: <unknown function> + 0x8b2ea (0x7f5a9e08b2ea in /lib64/libc.so.6)
frame #8: <unknown function> + 0x110b40 (0x7f5a9e110b40 in /lib64/libc.so.6)
terminate called after throwing an instance of 'c10::DistBackendError'
what(): [PG ID 2 PG GUID 3 Rank 1] Process group watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
Search for `cudaErrorIllegalAddress' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Exception raised from query at /pytorch/aten/src/ATen/cuda/CUDAEvent.h:108 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x9d (0x7f5a900defdd in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0xc0e0 (0x7f5a901780e0 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x50 (0x7f59d51008f0 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #3: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0x68 (0x7f59d510da68 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #4: c10d::ProcessGroupNCCL::Watchdog::runLoop() + 0x949 (0x7f59d5111539 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #5: c10d::ProcessGroupNCCL::Watchdog::run() + 0x105 (0x7f59d51135d5 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #6: <unknown function> + 0xe77e4 (0x7f5a602e77e4 in /lib64/libstdc++.so.6)
frame #7: <unknown function> + 0x8b2ea (0x7f5a9e08b2ea in /lib64/libc.so.6)
frame #8: <unknown function> + 0x110b40 (0x7f5a9e110b40 in /lib64/libc.so.6)
Exception raised from run at /pytorch/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:2099 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x9d (0x7f5a900defdd in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x68c348 (0x7f59d488c348 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #2: <unknown function> + 0xe77e4 (0x7f5a602e77e4 in /lib64/libstdc++.so.6)
frame #3: <unknown function> + 0x8b2ea (0x7f5a9e08b2ea in /lib64/libc.so.6)
frame #4: <unknown function> + 0x110b40 (0x7f5a9e110b40 in /lib64/libc.so.6)
[rank3]:[E319 19:55:16.908538094 ProcessGroupNCCL.cpp:2093] [PG ID 2 PG GUID 3 Rank 3] Process group watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
Search for `cudaErrorIllegalAddress' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Exception raised from query at /pytorch/aten/src/ATen/cuda/CUDAEvent.h:108 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x9d (0x7ff3f7b72fdd in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0xc0e0 (0x7ff3fb5330e0 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x50 (0x7ff3405008f0 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #3: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0x68 (0x7ff34050da68 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #4: c10d::ProcessGroupNCCL::Watchdog::runLoop() + 0x949 (0x7ff340511539 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #5: c10d::ProcessGroupNCCL::Watchdog::run() + 0x105 (0x7ff3405135d5 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #6: <unknown function> + 0xe77e4 (0x7ff3cb6e77e4 in /lib64/libstdc++.so.6)
frame #7: <unknown function> + 0x8b2ea (0x7ff40948b2ea in /lib64/libc.so.6)
frame #8: <unknown function> + 0x110b40 (0x7ff409510b40 in /lib64/libc.so.6)
terminate called after throwing an instance of 'c10::DistBackendError'
what(): [PG ID 2 PG GUID 3 Rank 3] Process group watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
Search for `cudaErrorIllegalAddress' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Exception raised from query at /pytorch/aten/src/ATen/cuda/CUDAEvent.h:108 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x9d (0x7ff3f7b72fdd in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0xc0e0 (0x7ff3fb5330e0 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x50 (0x7ff3405008f0 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #3: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0x68 (0x7ff34050da68 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #4: c10d::ProcessGroupNCCL::Watchdog::runLoop() + 0x949 (0x7ff340511539 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #5: c10d::ProcessGroupNCCL::Watchdog::run() + 0x105 (0x7ff3405135d5 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #6: <unknown function> + 0xe77e4 (0x7ff3cb6e77e4 in /lib64/libstdc++.so.6)
frame #7: <unknown function> + 0x8b2ea (0x7ff40948b2ea in /lib64/libc.so.6)
frame #8: <unknown function> + 0x110b40 (0x7ff409510b40 in /lib64/libc.so.6)
Exception raised from run at /pytorch/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:2099 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x9d (0x7ff3f7b72fdd in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x68c348 (0x7ff33fc8c348 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #2: <unknown function> + 0xe77e4 (0x7ff3cb6e77e4 in /lib64/libstdc++.so.6)
frame #3: <unknown function> + 0x8b2ea (0x7ff40948b2ea in /lib64/libc.so.6)
frame #4: <unknown function> + 0x110b40 (0x7ff409510b40 in /lib64/libc.so.6)
Your current environment
The output of
python collect_env.py🐛 Describe the bug
Description:
I am experiencing a critical crash (
CUDA error: an illegal memory access was encountered, cudaErrorIllegalAddress) when serving the zai-org/GLM-4.7-FP8 model with--max-num-batched-tokens< default value immediately after first requests.The service runs perfectly fine without explicit
--max-num-batched-tokens.Steps to Reproduce:
Start benchmark with:
Error:
Happy path:
Start the vLLM(0.17.1) server with the zai-org/GLM-4.7-FP8 model and default max-num-batched-tokens, everything works fine:
Before submitting a new issue...