Skip to content

fix legacy deepep path for flashinfer_cutedsl#22925

Merged
ch-wan merged 7 commits into
sgl-project:mainfrom
leejnau:fix-cutedsl-deepep-legacy
Apr 20, 2026
Merged

fix legacy deepep path for flashinfer_cutedsl#22925
ch-wan merged 7 commits into
sgl-project:mainfrom
leejnau:fix-cutedsl-deepep-legacy

Conversation

@leejnau
Copy link
Copy Markdown
Collaborator

@leejnau leejnau commented Apr 16, 2026

Motivation

The recent PR #21339 accidentally made it impossible to use the existing cutedsl moe backend + deepep a2a (--moe-runner-backend flashinfer_cutedsl --moe-a2a-backend deepep).

This manifested here:
Recipe bug: flashinfer_cutedsl moe-runner-backend incompatible with deepep a2a-backend #39.

This PR restores the previous DeepEP behavior without changing the existing auto backend resolution or generic runner setup logic.

Modifications

Restore the legacy DeepEP low-latency flashinfer_cutedsl MoE path by skipping MoeRunner initialization when flashinfer_cutedsl is used with moe_a2a_backend=deepep. This preserves the new standard CuteDSL path for moe_a2a_backend=none while keeping the old masked DeepEP execution path unchanged.

Accuracy Tests

N/A

Speed Tests and Profiling

N/A

Checklist

Review and Merge Process

  1. Ping Merge Oncalls to start the process. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • Common commands include /tag-and-rerun-ci, /tag-run-ci-label, /rerun-failed-ci
  4. After green CI and required approvals, ask Merge Oncalls or people with Write permission to merge the PR.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@github-actions github-actions Bot added the quant LLM Quantization label Apr 16, 2026
@leejnau leejnau marked this pull request as ready for review April 16, 2026 04:04
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@b8zhong
Copy link
Copy Markdown
Collaborator

b8zhong commented Apr 16, 2026

/tag-and-rerun-ci

@trevor-m
Copy link
Copy Markdown
Collaborator

#21339 also has some changes to the weight loading/processing, can you check if those interfere with the deepep path also?

@ch-wan ch-wan merged commit b4bb036 into sgl-project:main Apr 20, 2026
77 of 92 checks passed
@leejnau leejnau deleted the fix-cutedsl-deepep-legacy branch April 21, 2026 02:43
zhangying098 pushed a commit to zhangying098/sglang that referenced this pull request Apr 23, 2026
kyx1999 pushed a commit to KMSorSMS/sglang that referenced this pull request Apr 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

quant LLM Quantization run-ci

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants