Skip to content

[Tracking] Qwen3.5-397B (G)B200 Functional Support and Optimizations #20024

@nvpohanh

Description

@nvpohanh

Qwen3.5-397B (G)B200 Functional Support and Optimizations

Tracking issue for Qwen3.5-397B support in SGLang on (G)B200.


Progress Tracker

no-MTP+Agg

Category Precision Item Status
Functional FP8 DEP (high-throughput) works functionally ✅ DONE
Functional FP8 DEP (high-throughput) has good accuracy ✅ DONE
Functional FP8 TP (low-latency) works functionally ✅ DONE
Functional FP8 TP (low-latency) has good accuracy ✅ DONE
Functional NVFP4 DEP (high-throughput) works functionally ✅ DONE
Functional NVFP4 DEP (high-throughput) has good accuracy ✅ DONE
Functional NVFP4 TP (low-latency) works functionally ✅ DONE
Functional NVFP4 TP (low-latency) has good accuracy ✅ DONE
Baseline Perf FP8 DEP (high-throughput) uses correct backends/kernels ✅ DONE
Baseline Perf FP8 TP (low-latency) uses correct backends/kernels ✅ DONE
Baseline Perf NVFP4 DEP (high-throughput) uses correct backends/kernels ✅ DONE
Baseline Perf NVFP4 TP (low-latency) uses correct backends/kernels ✅ DONE
Cookbook (all) Update SGLang cookbook 🔄 IN PROGRESS
Perf Analysis FP8 Round 1 perf analysis 🔄 IN PROGRESS
Perf Analysis NVFP4 Round 1 perf analysis 🔄 IN PROGRESS
Perf Optimization FP8 Round 1 perf optimizations 🔄 IN PROGRESS
Perf Optimization NVFP4 Round 1 perf optimizations 🔄 IN PROGRESS

MTP+Agg

Category Precision Item Status
Functional FP8 DEP (high-throughput) works functionally ✅ DONE
Functional FP8 DEP (high-throughput) has good accuracy ✅ DONE
Functional FP8 TP (low-latency) works functionally ✅ DONE
Functional FP8 TP (low-latency) has good accuracy ✅ DONE
Functional NVFP4 DEP (high-throughput) works functionally ✅ DONE
Functional NVFP4 DEP (high-throughput) has good accuracy ✅ DONE
Functional NVFP4 TP (low-latency) works functionally ✅ DONE
Functional NVFP4 TP (low-latency) has good accuracy ✅ DONE
Baseline Perf FP8 DEP (high-throughput) uses correct backends/kernels ✅ DONE
Baseline Perf FP8 TP (low-latency) uses correct backends/kernels ✅ DONE
Baseline Perf NVFP4 DEP (high-throughput) uses correct backends/kernels ✅ DONE
Baseline Perf NVFP4 TP (low-latency) uses correct backends/kernels ✅ DONE
Cookbook (all) Update SGLang cookbook 🔄 IN PROGRESS
Perf Analysis FP8 Round 1 perf analysis 🔄 IN PROGRESS
Perf Analysis NVFP4 Round 1 perf analysis 🔄 IN PROGRESS
Perf Optimization FP8 Round 1 perf optimizations 🔄 IN PROGRESS
Perf Optimization NVFP4 Round 1 perf optimizations 🔄 IN PROGRESS

no-MTP+Disagg

Category Precision Item Status
Functional FP8 DEP (high-throughput) works functionally ✅ DONE
Functional FP8 DEP (high-throughput) has good accuracy ✅ DONE
Functional FP8 TP (low-latency) works functionally ✅ DONE
Functional FP8 TP (low-latency) has good accuracy ✅ DONE
Functional NVFP4 DEP (high-throughput) works functionally ✅ DONE
Functional NVFP4 DEP (high-throughput) has good accuracy ✅ DONE
Functional NVFP4 TP (low-latency) works functionally ✅ DONE
Functional NVFP4 TP (low-latency) has good accuracy ✅ DONE
Baseline Perf FP8 DEP (high-throughput) uses correct backends/kernels ✅ DONE
Baseline Perf FP8 TP (low-latency) uses correct backends/kernels ✅ DONE
Baseline Perf NVFP4 DEP (high-throughput) uses correct backends/kernels ✅ DONE
Baseline Perf NVFP4 TP (low-latency) uses correct backends/kernels ✅ DONE
Cookbook (all) Update SGLang cookbook 🔄 IN PROGRESS
Perf Analysis FP8 Round 1 perf analysis 🔄 IN PROGRESS
Perf Analysis NVFP4 Round 1 perf analysis 🔄 IN PROGRESS
Perf Optimization FP8 Round 1 perf optimizations 🔄 IN PROGRESS
Perf Optimization NVFP4 Round 1 perf optimizations 🔄 IN PROGRESS

MTP+Disagg

Category Precision Item Status
Functional FP8 DEP (high-throughput) works functionally ✅ DONE
Functional FP8 DEP (high-throughput) has good accuracy ✅ DONE
Functional FP8 TP (low-latency) works functionally ✅ DONE
Functional FP8 TP (low-latency) has good accuracy ✅ DONE
Functional NVFP4 DEP (high-throughput) works functionally ✅ DONE
Functional NVFP4 DEP (high-throughput) has good accuracy ✅ DONE
Functional NVFP4 TP (low-latency) works functionally ✅ DONE
Functional NVFP4 TP (low-latency) has good accuracy ✅ DONE
Baseline Perf FP8 DEP (high-throughput) uses correct backends/kernels ✅ DONE
Baseline Perf FP8 TP (low-latency) uses correct backends/kernels ✅ DONE
Baseline Perf NVFP4 DEP (high-throughput) uses correct backends/kernels ✅ DONE
Baseline Perf NVFP4 TP (low-latency) uses correct backends/kernels ✅ DONE
Cookbook (all) Update SGLang cookbook 🔄 IN PROGRESS
Perf Analysis FP8 Round 1 perf analysis 🔄 IN PROGRESS
Perf Analysis NVFP4 Round 1 perf analysis 🔄 IN PROGRESS
Perf Optimization FP8 Round 1 perf optimizations 🔄 IN PROGRESS
Perf Optimization NVFP4 Round 1 perf optimizations 🔄 IN PROGRESS

Weekly Progress

2026-05-08

2026-04-17

2026-04-08

2026-03-30

2026-03-17

2026-03-12

2026-03-06 (update)

  • Agg functional support (no-MTP+Agg, MTP+Agg): DONE for both FP8 and NVFP4
  • MTP enabled via merged PR: [Qwen3.5] Enable MTP spec_v2 and add test for nvidia/Qwen3.5-397B-A17B-NVFP4 #19391
  • Disagg functional support: IN PROGRESS (debugging TP size mismatch in Prefill/Decode)
  • Agg cookbook, IBDB pipeline, perf analysis: all IN PROGRESS
  • Perf optimizations (Agg + Disagg): IN PROGRESS — GDN (gated delta net) kernel optimizations ongoing

2026-03-06

  • Created and initialized all tasks

Related GitHub Issues

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions