restricting sampling in vllm with qwen models to avoid OOVs by uralik · Pull Request #1237 · facebookresearch/fairseq2

uralik · 2025-07-29T00:29:43Z

What does this PR do? Please describe:

qwen model has bigger projection layer than the actual tokenizer vocabulary. Because of this it can sample non-existing tokens w.r.t. tokenizer and give vllm errors in RL setup

here we patch that by manually assigning allowed tokens id ONLY for policy actor (that samples the token ids for training) if we use qwen model for policy model.

Co-authored-by: Ilia Kulikov <kulikov@meta.com>

restricting sampling in vllm with qwen models to avoid OOVs

2db142d

uralik requested review from jacklanchantin and swarnaHub July 29, 2025 00:29

uralik requested a review from cbalioglu as a code owner July 29, 2025 00:29

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 29, 2025

swarnaHub approved these changes Jul 29, 2025

View reviewed changes

uralik merged commit 1799948 into online_training Jul 29, 2025
6 of 13 checks passed

uralik deleted the kulikov/qwen_oov_fix branch July 29, 2025 00:56

jacklanchantin pushed a commit that referenced this pull request Aug 22, 2025

restricting sampling in vllm with qwen models to avoid OOVs (#1237)

cf3e708

Co-authored-by: Ilia Kulikov <kulikov@meta.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

restricting sampling in vllm with qwen models to avoid OOVs#1237

restricting sampling in vllm with qwen models to avoid OOVs#1237
uralik merged 1 commit into
online_trainingfrom
kulikov/qwen_oov_fix

uralik commented Jul 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

uralik commented Jul 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants