Skip to content

[Bug] Incorrect passing of ForwardBatch parameter in TpModelWorker.forward_batch_generation #5506

@u4lr451

Description

@u4lr451

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.
  • 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
  • 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
  • 5. Please use English, otherwise it will be closed.

Describe the bug

Incorrect passing of ForwardBatch parameter in TpModelWorker.forward_batch_generation

[2025-04-17 17:26:04 DP9 TP9] Prefill batch. #new-seq: 1, #new-token: 7, #cached-token: 0, token usage: 0.00, #running-req: 0, #queue-req: 0, 
[2025-04-17 17:26:06 DP13 TP13] Scheduler hit an exception: Traceback (most recent call last):
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 2006, in run_scheduler_process
    scheduler.event_loop_normal()
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 618, in event_loop_normal
    result = self.run_batch(batch)
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1366, in run_batch
    ) = self.draft_worker.forward_batch_speculative_generation(batch)
  File "/sgl-workspace/sglang/python/sglang/srt/speculative/eagle_worker.py", line 275, in forward_batch_speculative_generation
    self.target_worker.forward_batch_generation(
  File "/sgl-workspace/sglang/python/sglang/srt/managers/tp_worker.py", line 175, in forward_batch_generation
    forward_batch = ForwardBatch.init_new(model_worker_batch, self.model_runner)
  File "/sgl-workspace/sglang/python/sglang/srt/model_executor/forward_batch_info.py", line 262, in init_new
    if batch.extend_input_logprob_token_ids is not None:
AttributeError: 'ForwardBatch' object has no attribute 'extend_input_logprob_token_ids'. Did you mean: 'extend_input_logprob_token_ids_gpu'?

Reproduction

two nodes

python3 -m sglang.launch_server --model-path /sgl-workspace/DeepSeek-R1 --dist-init-addr ${INIT_ADDR}:20000 --nnodes 2 --node-rank ${RANK}  --trust-remote-code --served-model-name DeepSeek-R1  --tensor-parallel-size 16 --stream-output --host 0.0.0.0 --port 8080  --disable-radix-cache  --disable-overlap-schedule --attention-backend flashinfer --disable-cuda-graph-padding --mem-fraction-static 0.60 --speculative-algo NEXTN --speculative-draft /sgl-workspace/SGLang/DeepSeek-R1-NextN --speculative-num-steps 4 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4 --enable-metrics --log-requests --enable-dp-attention --disable-custom-all-reduce --cuda-graph-max-bs 64 --dp-size 16 --disable-cuda-graph

Environment

Python: 3.10.12 (main, Feb 4 2025, 14:57:36) [GCC 11.4.0]
CUDA available: True
GPU 0,1,2,3,4,5,6,7: NVIDIA H20
GPU 0,1,2,3,4,5,6,7 Compute Capability: 9.0
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 12.4, V12.4.131
CUDA Driver Version: 535.161.08
PyTorch: 2.5.1+cu124
sglang: 0.4.5.post1
sgl_kernel: 0.0.9.post1
flashinfer: Module Not Found
triton: 3.1.0
transformers: 4.51.1
torchao: 0.9.0
numpy: 2.2.4
aiohttp: 3.11.16
fastapi: 0.115.12
hf_transfer: 0.1.9
huggingface_hub: 0.30.1
interegular: 0.3.3
modelscope: 1.24.1
orjson: 3.10.16
outlines: 0.1.11
packaging: 24.2
psutil: 7.0.0
pydantic: 2.11.2
multipart: Module Not Found
zmq: Module Not Found
uvicorn: 0.34.0
uvloop: 0.21.0
vllm: Module Not Found
xgrammar: 0.1.17
openai: 1.70.0
tiktoken: 0.9.0
anthropic: 0.49.0
litellm: 1.65.4.post1
decord: 0.6.0
NVIDIA Topology:
GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 NIC0 NIC1 NIC2 NIC3 NIC4 NIC5 NIC6 NIC7 NIC8 NIC9 NIC10 NIC11 NIC12 NIC13 NIC14 NIC15 NIC16 NIC17 NIC18 NIC19 NIC20 NIC21 NIC22 NIC23 NIC24 NIC25 NIC26 NIC27 NIC28 NIC29 NIC30 NIC31 NIC32 NIC33 NIC34 NIC35 NIC36 NIC37 NIC38 NIC39 NIC40 NIC41 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X NV18 NV18 NV18 NV18 NV18 NV18 NV18 SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS PIX NODE NODE NODE SYS SYS SYS SYS 0-95,192-287 0 N/A
GPU1 NV18 X NV18 NV18 NV18 NV18 NV18 NV18 SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS NODE NODE PHB PIX SYS SYS SYS SYS 0-95,192-287 0 N/A
GPU2 NV18 NV18 X NV18 NV18 NV18 NV18 NV18 SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS NODE NODE PIX PHB SYS SYS SYS SYS 0-95,192-287 0 N/A
GPU3 NV18 NV18 NV18 X NV18 NV18 NV18 NV18 SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS NODE PIX NODE NODE SYS SYS SYS SYS 0-95,192-287 0 N/A
GPU4 NV18 NV18 NV18 NV18 X NV18 NV18 NV18 NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE SYS SYS SYS SYS NODE NODE PIX NODE 96-191,288-383 1 N/A
GPU5 NV18 NV18 NV18 NV18 NV18 X NV18 NV18 NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE SYS SYS SYS SYS NODE PIX NODE NODE 96-191,288-383 1 N/A
GPU6 NV18 NV18 NV18 NV18 NV18 NV18 X NV18 NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE SYS SYS SYS SYS PHB NODE NODE PIX 96-191,288-383 1 N/A
GPU7 NV18 NV18 NV18 NV18 NV18 NV18 NV18 X NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE SYS SYS SYS SYS PIX NODE NODE PHB 96-191,288-383 1 N/A
NIC0 SYS SYS SYS SYS NODE NODE NODE NODE X PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX SYS SYS SYS SYS NODE NODE NODE NODE
NIC1 SYS SYS SYS SYS NODE NODE NODE NODE PIX X PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX SYS SYS SYS SYS NODE NODE NODE NODE
NIC2 SYS SYS SYS SYS NODE NODE NODE NODE PIX PIX X PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX SYS SYS SYS SYS NODE NODE NODE NODE
NIC3 SYS SYS SYS SYS NODE NODE NODE NODE PIX PIX PIX X PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX SYS SYS SYS SYS NODE NODE NODE NODE
NIC4 SYS SYS SYS SYS NODE NODE NODE NODE PIX PIX PIX PIX X PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX SYS SYS SYS SYS NODE NODE NODE NODE
NIC5 SYS SYS SYS SYS NODE NODE NODE NODE PIX PIX PIX PIX PIX X PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX SYS SYS SYS SYS NODE NODE NODE NODE
NIC6 SYS SYS SYS SYS NODE NODE NODE NODE PIX PIX PIX PIX PIX PIX X PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX SYS SYS SYS SYS NODE NODE NODE NODE
NIC7 SYS SYS SYS SYS NODE NODE NODE NODE PIX PIX PIX PIX PIX PIX PIX X PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX SYS SYS SYS SYS NODE NODE NODE NODE
NIC8 SYS SYS SYS SYS NODE NODE NODE NODE PIX PIX PIX PIX PIX PIX PIX PIX X PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX SYS SYS SYS SYS NODE NODE NODE NODE
NIC9 SYS SYS SYS SYS NODE NODE NODE NODE PIX PIX PIX PIX PIX PIX PIX PIX PIX X PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX SYS SYS SYS SYS NODE NODE NODE NODE
NIC10 SYS SYS SYS SYS NODE NODE NODE NODE PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX X PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX SYS SYS SYS SYS NODE NODE NODE NODE
NIC11 SYS SYS SYS SYS NODE NODE NODE NODE PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX X PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX SYS SYS SYS SYS NODE NODE NODE NODE
NIC12 SYS SYS SYS SYS NODE NODE NODE NODE PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX X PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX SYS SYS SYS SYS NODE NODE NODE NODE
NIC13 SYS SYS SYS SYS NODE NODE NODE NODE PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX X PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX SYS SYS SYS SYS NODE NODE NODE NODE
NIC14 SYS SYS SYS SYS NODE NODE NODE NODE PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX X PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX SYS SYS SYS SYS NODE NODE NODE NODE
NIC15 SYS SYS SYS SYS NODE NODE NODE NODE PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX X PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX SYS SYS SYS SYS NODE NODE NODE NODE
NIC16 SYS SYS SYS SYS NODE NODE NODE NODE PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX X PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX SYS SYS SYS SYS NODE NODE NODE NODE
NIC17 SYS SYS SYS SYS NODE NODE NODE NODE PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX X PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX SYS SYS SYS SYS NODE NODE NODE NODE
NIC18 SYS SYS SYS SYS NODE NODE NODE NODE PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX X PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX SYS SYS SYS SYS NODE NODE NODE NODE
NIC19 SYS SYS SYS SYS NODE NODE NODE NODE PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX X PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX SYS SYS SYS SYS NODE NODE NODE NODE
NIC20 SYS SYS SYS SYS NODE NODE NODE NODE PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX X PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX SYS SYS SYS SYS NODE NODE NODE NODE
NIC21 SYS SYS SYS SYS NODE NODE NODE NODE PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX X PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX SYS SYS SYS SYS NODE NODE NODE NODE
NIC22 SYS SYS SYS SYS NODE NODE NODE NODE PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX X PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX SYS SYS SYS SYS NODE NODE NODE NODE
NIC23 SYS SYS SYS SYS NODE NODE NODE NODE PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX X PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX SYS SYS SYS SYS NODE NODE NODE NODE
NIC24 SYS SYS SYS SYS NODE NODE NODE NODE PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX X PIX PIX PIX PIX PIX PIX PIX PIX PIX SYS SYS SYS SYS NODE NODE NODE NODE
NIC25 SYS SYS SYS SYS NODE NODE NODE NODE PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX X PIX PIX PIX PIX PIX PIX PIX PIX SYS SYS SYS SYS NODE NODE NODE NODE
NIC26 SYS SYS SYS SYS NODE NODE NODE NODE PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX X PIX PIX PIX PIX PIX PIX PIX SYS SYS SYS SYS NODE NODE NODE NODE
NIC27 SYS SYS SYS SYS NODE NODE NODE NODE PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX X PIX PIX PIX PIX PIX PIX SYS SYS SYS SYS NODE NODE NODE NODE
NIC28 SYS SYS SYS SYS NODE NODE NODE NODE PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX X PIX PIX PIX PIX PIX SYS SYS SYS SYS NODE NODE NODE NODE
NIC29 SYS SYS SYS SYS NODE NODE NODE NODE PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX X PIX PIX PIX PIX SYS SYS SYS SYS NODE NODE NODE NODE
NIC30 SYS SYS SYS SYS NODE NODE NODE NODE PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX X PIX PIX PIX SYS SYS SYS SYS NODE NODE NODE NODE
NIC31 SYS SYS SYS SYS NODE NODE NODE NODE PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX X PIX PIX SYS SYS SYS SYS NODE NODE NODE NODE
NIC32 SYS SYS SYS SYS NODE NODE NODE NODE PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX X PIX SYS SYS SYS SYS NODE NODE NODE NODE
NIC33 SYS SYS SYS SYS NODE NODE NODE NODE PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX PIX X SYS SYS SYS SYS NODE NODE NODE NODE
NIC34 PIX NODE NODE NODE SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS X NODE NODE NODE SYS SYS SYS SYS
NIC35 NODE NODE NODE PIX SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS NODE X NODE NODE SYS SYS SYS SYS
NIC36 NODE PHB PIX NODE SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS NODE NODE X PHB SYS SYS SYS SYS
NIC37 NODE PIX PHB NODE SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS NODE NODE PHB X SYS SYS SYS SYS
NIC38 SYS SYS SYS SYS NODE NODE PHB PIX NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE SYS SYS SYS SYS X NODE NODE PHB
NIC39 SYS SYS SYS SYS NODE PIX NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE SYS SYS SYS SYS NODE X NODE NODE
NIC40 SYS SYS SYS SYS PIX NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE SYS SYS SYS SYS NODE NODE X NODE
NIC41 SYS SYS SYS SYS NODE NODE PIX PHB NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE NODE SYS SYS SYS SYS PHB NODE NODE X

Legend:

X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks

NIC Legend:

NIC0: NIC41 CPU Affinity NUMA Affinity GPU NUMA ID
NIC1: MA ID
NIC2: mlx5_2
NIC3: mlx5_3
NIC4: mlx5_4
NIC5: mlx5_5
NIC6: mlx5_6
NIC7: mlx5_7
NIC8: mlx5_8
NIC9: mlx5_9
NIC10: mlx5_10
NIC11: mlx5_11
NIC12: mlx5_12
NIC13: mlx5_13
NIC14: mlx5_14
NIC15: mlx5_15
NIC16: mlx5_16
NIC17: mlx5_17
NIC18: mlx5_18
NIC19: mlx5_19
NIC20: mlx5_20
NIC21: mlx5_21
NIC22: mlx5_22
NIC23: mlx5_23
NIC24: mlx5_24
NIC25: mlx5_25
NIC26: mlx5_26
NIC27: mlx5_27
NIC28: mlx5_28
NIC29: mlx5_29
NIC30: mlx5_30
NIC31: mlx5_31
NIC32: mlx5_32
NIC33: mlx5_33
NIC34: mlx5_bond_1
NIC35: mlx5_bond_2
NIC36: mlx5_bond_3
NIC37: mlx5_bond_4
NIC38: mlx5_bond_5
NIC39: mlx5_bond_6
NIC40: mlx5_bond_7
NIC41: mlx5_bond_8

ulimit soft: 1000000

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions