Skip to content

Commit cdc0499

Browse files
yanghui1-archJiantaoXu
authored andcommitted
[Bugfix] RoBERTa position_id accumulation in CUDA graph padding region (vllm-project#37873)
Signed-off-by: dass90 <3053034939@qq.com>
1 parent 477534f commit cdc0499

1 file changed

Lines changed: 2 additions & 0 deletions

File tree

vllm/v1/worker/gpu_model_runner.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3084,6 +3084,8 @@ def _preprocess(
30843084
positions = self.xdrope_positions.gpu[:, :num_input_tokens]
30853085
else:
30863086
positions = self.positions.gpu[:num_input_tokens]
3087+
if num_input_tokens > num_scheduled_tokens:
3088+
self.positions.gpu[num_scheduled_tokens:num_input_tokens].zero_()
30873089

30883090
if is_first_rank:
30893091
intermediate_tensors = None

0 commit comments

Comments
 (0)