Fix handling of attention-bias in MHA fusion #2332

gramalingam · 2025-05-22T19:35:09Z

In models generated from pytorch, masks may have shapes that are broadcastable to (B, H, S, St): eg., a 2D mask of shape (S, St) or even shape (1, 1, 1, St) in one example.

ONNX's opset23 Attention op allows masks of this shape. However, ORT's contrib ops (MHA, Attention) allow a mask of shape (1 or B, 1 or H, S, St). That is: they support broadcast only for the first two dimensions. (Even that is not supported by some earlier versions of ORT, which we don't consider here.)

So, while doing fusion for MHA, we should expand the mask to ensure it satisfies the constraints of MHA/Attention.

Signed-off-by: Ganesan Ramalingam <[email protected]>

codecov · 2025-05-22T19:38:49Z

❌ 3 Tests Failed:

Tests completed	Failed	Passed	Skipped
16179	3	16176	1701

View the top 3 failed test(s) by shortest run time

onnxscript.backend.onnx_export_test.TestOnnxBackEnd::test_export2python_produces_correct_onnx_script_model_0649_test_min_float32

Stack Traces | 0.003s run time

onnxscript\backend\onnx_export_test.py:137: in extract_functions
    mod = importlib.import_module(import_name)
C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\importlib\__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
E   ModuleNotFoundError: No module named 'tests.onnx_backend_test_code.test_min_float32'

The above exception was the direct cause of the following exception:
.nox\test\lib\site-packages\parameterized\parameterized.py:620: in standalone_func
    return func(*(a + p.args), **p.kwargs, **kw)
onnxscript\backend\onnx_export_test.py:271: in test_export2python_produces_correct_onnx_script_model
    functions = extract_functions(backend_test.name, code, self.test_folder)
onnxscript\backend\onnx_export_test.py:139: in extract_functions
    raise AssertionError(
E   AssertionError: Unable to import 'tests.onnx_backend_test_code.test_min_float32' (e=No module named 'tests.onnx_backend_test_code.test_min_float32') (file: 'D:\\a\\onnxscript\\onnxscript\\tests\\onnx_backend_test_code\\test_min_float32.py', absolute path: 'D:\\a\\onnxscript\\onnxscript\\tests\\onnx_backend_test_code\\test_min_float32.py', current folder: D:\a\onnxscript\onnxscript
E   ---- CONTENT --
E   import numpy
E   from onnx import TensorProto
E   from onnx.helper import make_tensor
E   from onnxscript import script, external_tensor
E   from onnxscript.values import Opset
E   from onnxscript.onnx_types import FLOAT
E   from onnxscript.onnx_opset import opset13
E   
E   @script()
E   def bck_test_min_float32(data_0: FLOAT[3], data_1: FLOAT[3]) -> (FLOAT[3]):
E       result = opset13.Min(data_0, data_1)
E       return result

onnxscript.backend.onnx_export_test.TestOnnxBackEnd::test_export2python_produces_correct_onnx_script_model_0680_test_mul_example

Stack Traces | 0.003s run time

onnxscript\backend\onnx_export_test.py:137: in extract_functions
    mod = importlib.import_module(import_name)
C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\importlib\__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
E   ModuleNotFoundError: No module named 'tests.onnx_backend_test_code.test_mul_example'

The above exception was the direct cause of the following exception:
.nox\test\lib\site-packages\parameterized\parameterized.py:620: in standalone_func
    return func(*(a + p.args), **p.kwargs, **kw)
onnxscript\backend\onnx_export_test.py:271: in test_export2python_produces_correct_onnx_script_model
    functions = extract_functions(backend_test.name, code, self.test_folder)
onnxscript\backend\onnx_export_test.py:139: in extract_functions
    raise AssertionError(
E   AssertionError: Unable to import 'tests.onnx_backend_test_code.test_mul_example' (e=No module named 'tests.onnx_backend_test_code.test_mul_example') (file: 'D:\\a\\onnxscript\\onnxscript\\tests\\onnx_backend_test_code\\test_mul_example.py', absolute path: 'D:\\a\\onnxscript\\onnxscript\\tests\\onnx_backend_test_code\\test_mul_example.py', current folder: D:\a\onnxscript\onnxscript
E   ---- CONTENT --
E   import numpy
E   from onnx import TensorProto
E   from onnx.helper import make_tensor
E   from onnxscript import script, external_tensor
E   from onnxscript.values import Opset
E   from onnxscript.onnx_types import FLOAT
E   from onnxscript.onnx_opset import opset14
E   
E   @script()
E   def bck_test_mul_example(x: FLOAT[3], y: FLOAT[3]) -> (FLOAT[3]):
E       z = opset14.Mul(x, y)
E       return z

onnxscript.backend.onnx_export_test.TestOnnxBackEnd::test_export2python_produces_correct_onnx_script_model_0411_test_gemm_transposeA

Stack Traces | 0.004s run time

onnxscript\backend\onnx_export_test.py:137: in extract_functions
    mod = importlib.import_module(import_name)
C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\importlib\__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
E   ModuleNotFoundError: No module named 'tests.onnx_backend_test_code.test_gemm_transposeA'

The above exception was the direct cause of the following exception:
.nox\test\lib\site-packages\parameterized\parameterized.py:620: in standalone_func
    return func(*(a + p.args), **p.kwargs, **kw)
onnxscript\backend\onnx_export_test.py:271: in test_export2python_produces_correct_onnx_script_model
    functions = extract_functions(backend_test.name, code, self.test_folder)
onnxscript\backend\onnx_export_test.py:139: in extract_functions
    raise AssertionError(
E   AssertionError: Unable to import 'tests.onnx_backend_test_code.test_gemm_transposeA' (e=No module named 'tests.onnx_backend_test_code.test_gemm_transposeA') (file: 'D:\\a\\onnxscript\\onnxscript\\tests\\onnx_backend_test_code\\test_gemm_transposeA.py', absolute path: 'D:\\a\\onnxscript\\onnxscript\\tests\\onnx_backend_test_code\\test_gemm_transposeA.py', current folder: D:\a\onnxscript\onnxscript
E   ---- CONTENT --
E   import numpy
E   from onnx import TensorProto
E   from onnx.helper import make_tensor
E   from onnxscript import script, external_tensor
E   from onnxscript.values import Opset
E   from onnxscript.onnx_types import FLOAT
E   from onnxscript.onnx_opset import opset13
E   
E   @script()
E   def bck_test_gemm_transposeA(a: FLOAT[6,3], b: FLOAT[6,4], c: FLOAT[1,4]) -> (FLOAT[3,4]):
E       y = opset13.Gemm(a, b, c, transA=1)
E       return y

To view more test analytics, go to the Test Analytics Dashboard
_{📋 Got 3 mins? Take this short survey to help us improve Test Analytics.}

Copilot

Pull Request Overview

This PR enhances attention-bias (mask) handling in the MHA fusion by enforcing ORT contrib ops’ mask shape requirements and expanding 2D masks for broadcasting.

Adds shape checks to ensure masks are 2D or 4D with broadcastable first two dims
Tracks when mask broadcast is needed via _use_mask_broadcast
Inserts an Expand in rewrite() to reshape 2D masks to 4D for MultiHeadAttention

Comments suppressed due to low confidence (1)

onnxscript/rewriter/ort_fusions/mha.py:285

[nitpick] The name mask_dim_2 is ambiguous; consider renaming it to something more descriptive like mask_seq_len_dim or mask_S_or_1 to clarify that this binding holds the S-or-1 dimension.

mask_dim_2 = bindings.get("S_or_1")

onnxscript/rewriter/ort_fusions/mha.py

gramalingam added 4 commits May 21, 2025 16:02

Stash partial change

f4dd939

Signed-off-by: Ganesan Ramalingam <[email protected]>

Merge branch 'main' into rama/attn-bias-shape

095c354

Fix attention bias shape in MHA fusion

5c64cb9

Signed-off-by: Ganesan Ramalingam <[email protected]>

Undo irrelevant change

6c33cf9

Signed-off-by: Ganesan Ramalingam <[email protected]>

github-project-automation bot added this to ONNX Script Review Board May 22, 2025

github-project-automation bot moved this to Todo in ONNX Script Review Board May 22, 2025

Merge branch 'main' into rama/attn-bias-shape

4617fa1

justinchuby requested a review from Copilot May 22, 2025 20:17

justinchuby approved these changes May 22, 2025

View reviewed changes

github-project-automation bot moved this from Todo to Done in ONNX Script Review Board May 22, 2025

justinchuby added the module: rewriter label May 22, 2025

Copilot AI reviewed May 22, 2025

View reviewed changes

onnxscript/rewriter/ort_fusions/mha.py Show resolved Hide resolved

onnxscript/rewriter/ort_fusions/mha.py Show resolved Hide resolved

gramalingam enabled auto-merge (squash) May 22, 2025 20:51

gramalingam merged commit ef7e9e7 into main May 22, 2025
23 of 27 checks passed

gramalingam deleted the rama/attn-bias-shape branch May 22, 2025 20:52

justinchuby added this to the 0.2.7 milestone May 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix handling of attention-bias in MHA fusion #2332

Fix handling of attention-bias in MHA fusion #2332

Uh oh!

gramalingam commented May 22, 2025

Uh oh!

codecov bot commented May 22, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Fix handling of attention-bias in MHA fusion #2332

Fix handling of attention-bias in MHA fusion #2332

Uh oh!

Conversation

gramalingam commented May 22, 2025

Uh oh!

codecov bot commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

❌ 3 Tests Failed:

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov bot commented May 22, 2025 •

edited

Loading