Skip to content

Commit f927192

Browse files
fix(audio-encoder): apply upstream PR QwenLM#103 block-diagonal mask for non-FA2 backends
SDPA/eager attention ignored cu_seqlens and ran full global self-attention over the whole batch, degrading transcription quality. Build the block-diagonal chunk mask via _prepare_attention_mask and pass it to the encoder layers; FA2 path is unchanged (mask is None). Co-authored-by: Cursor <cursoragent@cursor.com>
1 parent c17a131 commit f927192

1 file changed

Lines changed: 8 additions & 0 deletions

File tree

qwen_asr/core/transformers_backend/modeling_qwen3_asr.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -724,11 +724,19 @@ def forward(
724724
if remainder != 0:
725725
cu_chunk_lens += [remainder]
726726
cu_seqlens = torch.tensor(cu_chunk_lens, device=aftercnn_lens.device).cumsum(-1, dtype=torch.int32)
727+
# Upstream PR #103 fix: build a block-diagonal 4D attention mask for
728+
# non-FA2 backends (SDPA/eager). Without this, SDPA/eager ignore
729+
# cu_seqlens and perform full global self-attention, which severely
730+
# degrades transcription quality (~340 vs ~555 words on a 5min clip).
731+
# _prepare_attention_mask returns None for flash_attention_2 so the
732+
# FA2 path is unchanged.
733+
attention_mask = self._prepare_attention_mask(hidden_states, cu_seqlens)
727734

728735
for encoder_layer in self.layers:
729736
layer_outputs = encoder_layer(
730737
hidden_states,
731738
cu_seqlens,
739+
attention_mask=attention_mask,
732740
)
733741

734742
hidden_states = layer_outputs[0]

0 commit comments

Comments
 (0)