Skip to content

Fix hybrid_linear_attn_backend crash with ngram speculation#20739

Merged
hnyls2002 merged 5 commits intosgl-project:mainfrom
he-yufeng:fix/ngram-missing-topk
Apr 8, 2026
Merged

Fix hybrid_linear_attn_backend crash with ngram speculation#20739
hnyls2002 merged 5 commits intosgl-project:mainfrom
he-yufeng:fix/ngram-missing-topk

Conversation

@he-yufeng
Copy link
Copy Markdown
Contributor

@he-yufeng he-yufeng commented Mar 17, 2026

Problem

hybrid_linear_attn_backend accesses spec_info.topk at runtime during target_verify mode, but NgramVerifyInput doesn't define topk, causing an AttributeError crash with --speculative-algo NGRAM.

Fix

Read topk from server_args.speculative_eagle_topk at init time instead of from spec_info at runtime. This avoids the dependency on SpecInput subtypes all defining topk, and is consistent with how the backend reads other config (pad_slot_id, device, etc.).

For ngram, speculative_eagle_topk is set to speculative_ngram_max_bfs_breadth in server_args, so tree attention branches execute correctly.

Fixes #20721

Attention backends (hybrid_linear_attn_backend, etc.) access
spec_info.topk unconditionally during target_verify, but
NgramVerifyInput never sets it. This crashes at server startup
when using --speculative-algo NGRAM.

Add topk=1 to NgramVerifyInput since ngram speculation doesn't
use tree attention (unlike Eagle which has topk>1).

Fixes sgl-project#20721
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Copy link
Copy Markdown
Collaborator

@kpham-sgl kpham-sgl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The actual fix should be propagating speculative_eagle_topk to NgramVerifyInput. Its actually already set in

self.speculative_eagle_topk = self.speculative_ngram_max_bfs_breadth

Conceptually, Ngram does build a spec tree (see https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/speculative/cpp_ngram/ngram.cpp#L257 and https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/speculative/cpp_ngram/ngram.cpp#L296)
The parameters that control the tree breadth and depth are

  # Tree breadth:                                                                                                                                                      
  --speculative-ngram-min-bfs-breadth (default: 1)                                                                                                                 
  --speculative-ngram-max-bfs-breadth (default: 10)                                                                                                                
                                                                                                                                                                     
  # Match window (tree depth):                                                                                                                                         
  --speculative-ngram-min-match-window-size (default: 1)                                                                                                           
  --speculative-ngram-max-match-window-size (default: 12)                                                                                                          
                                                                                                                                                                     
  # Other NGRAM params:                                                                                                                                                
  --speculative-ngram-branch-length (default: 18)                                                                                                                  
  --speculative-ngram-match-type (BFS or PROB, default: BFS)

@hnyls2002 hnyls2002 self-assigned this Mar 22, 2026
… server_args

hybrid_linear_attn_backend was the only attention backend accessing
spec_info.topk at runtime. All other backends read topk from
server_args.speculative_eagle_topk during __init__. This makes
hybrid_linear_attn_backend consistent and removes the hardcoded
self.topk = 1 from NgramVerifyInput that was papering over the issue.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@kpham-sgl
Copy link
Copy Markdown
Collaborator

kpham-sgl commented Mar 26, 2026

To make it consistent with some other attention backend, the simpler fix is to read directly from server args

self.topk = model_runner.server_args.speculative_eagle_topk or 0

self.topk = model_runner.server_args.speculative_eagle_topk or 0

Triton instantiate its own self.topk which can also be trace back to self.topk server_args.speculative_eagle_topk

@he-yufeng lmk what you think

@kpham-sgl
Copy link
Copy Markdown
Collaborator

/tag-and-rerun-ci

@hnyls2002
Copy link
Copy Markdown
Collaborator

hnyls2002 commented Apr 8, 2026

/rerun-test test_hybrid_attn_backend.py test_ngram_speculative_decoding.py
(2 tries)

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 8, 2026

1-gpu-h100 (1 test): View workflow run

cd test/ && python3 registered/spec/test_ngram_speculative_decoding.py

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 8, 2026

1-gpu-h100 (2 tests): View workflow run

cd test/ && python3 registered/attention/test_hybrid_attn_backend.py
cd test/ && python3 registered/spec/test_ngram_speculative_decoding.py

@hnyls2002 hnyls2002 changed the title Fix NgramVerifyInput missing topk attribute Fix hybrid_linear_attn_backend crash with ngram speculation Apr 8, 2026
@hnyls2002 hnyls2002 merged commit c89afae into sgl-project:main Apr 8, 2026
56 of 95 checks passed
yhyang201 pushed a commit to yhyang201/sglang that referenced this pull request Apr 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] 'NgramVerifyInput' object has no attribute 'topk'

3 participants