Improve prefix handling, attention mask compatibility, and KV cache control by YangYangGirl · Pull Request #11 · QwenLM/ParScale

YangYangGirl · 2025-08-21T05:55:01Z

Prefix mask fix
Changed prefix mask values from 0 to 1 so virtual prefix tokens can properly participate in attention.
Prefix K/V concatenation during training
Added logic to concatenate prefix key/value states during training.
2D attention mask support
Enabled attention_mask in [B, T] format in addition to 4D [B, H, T_q, T_k].
Configurable KV cache usage
Added use_cache parameter; KV cache is disabled during training.

- Prefix mask fix Changed prefix mask values from 0 to 1 so virtual prefix tokens can properly participate in attention. - Prefix K/V concatenation during training Added logic to concatenate prefix key/value states during training. - 2D attention mask support Enabled attention_mask in [B, T] format in addition to 4D [B, H, T_q, T_k]. - Configurable KV cache usage Added use_cache parameter; KV cache is disabled during training.

YangYangGirl added 2 commits August 21, 2025 13:52

Update modeling_qwen2_parscale.py

e080941

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve prefix handling, attention mask compatibility, and KV cache control#11

Improve prefix handling, attention mask compatibility, and KV cache control#11
YangYangGirl wants to merge 2 commits intoQwenLM:mainfrom
YangYangGirl:main

YangYangGirl commented Aug 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

YangYangGirl commented Aug 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant