Skip to content

restricting sampling in vllm with qwen models to avoid OOVs#1237

Merged
uralik merged 1 commit into
online_trainingfrom
kulikov/qwen_oov_fix
Jul 29, 2025
Merged

restricting sampling in vllm with qwen models to avoid OOVs#1237
uralik merged 1 commit into
online_trainingfrom
kulikov/qwen_oov_fix

Conversation

@uralik
Copy link
Copy Markdown
Contributor

@uralik uralik commented Jul 29, 2025

What does this PR do? Please describe:

qwen model has bigger projection layer than the actual tokenizer vocabulary. Because of this it can sample non-existing tokens w.r.t. tokenizer and give vllm errors in RL setup

here we patch that by manually assigning allowed tokens id ONLY for policy actor (that samples the token ids for training) if we use qwen model for policy model.

@uralik uralik requested a review from cbalioglu as a code owner July 29, 2025 00:29
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 29, 2025
@uralik uralik merged commit 1799948 into online_training Jul 29, 2025
6 of 13 checks passed
@uralik uralik deleted the kulikov/qwen_oov_fix branch July 29, 2025 00:56
jacklanchantin pushed a commit that referenced this pull request Aug 22, 2025
Co-authored-by: Ilia Kulikov <kulikov@meta.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants