AMDGPU: Add new VA inline asm constraint for AV registers #152665

arsenm · 2025-08-08T08:37:38Z

Add a new constraint corresponding to the AV_* register classes
for operands which can allocate AGPRs or VGPRs. This applies
to load and stores on gfx90a+, and srcA / srcB for MFMA instructions.

The error emitted on unsupported targets isn't ideal, it is
produced by the register allocator without a rationale, but it is
consistent with the existing errors.

I mostly want this for writing allocation tests.

Add a new constraint corresponding to the AV_* register classes for operands which can allocate AGPRs or VGPRs. This applies to load and stores on gfx90a+, and srcA / srcB for MFMA instructions. The error emitted on unsupported targets isn't ideal, it is produced by the register allocator without a rationale, but it is consistent with the existing errors. I mostly want this for writing allocation tests.

arsenm · 2025-08-08T08:37:54Z

AMDGPU: Add new VA inline asm constraint for AV registers #152665 👈 (View in Graphite)
main

This stack of pull requests is managed by Graphite. Learn more about stacking.

llvmbot · 2025-08-08T08:38:10Z

@llvm/pr-subscribers-llvm-ir

@llvm/pr-subscribers-backend-amdgpu

Author: Matt Arsenault (arsenm)

Changes

Add a new constraint corresponding to the AV_* register classes
for operands which can allocate AGPRs or VGPRs. This applies
to load and stores on gfx90a+, and srcA / srcB for MFMA instructions.

The error emitted on unsupported targets isn't ideal, it is
produced by the register allocator without a rationale, but it is
consistent with the existing errors.

I mostly want this for writing allocation tests.

Full diff: https://github.com/llvm/llvm-project/pull/152665.diff

4 Files Affected:

(modified) llvm/docs/LangRef.rst (+7-3)
(modified) llvm/lib/Target/AMDGPU/SIISelLowering.cpp (+21-5)
(added) llvm/test/CodeGen/AMDGPU/inline-asm-av-constraint-err.ll (+27)
(added) llvm/test/CodeGen/AMDGPU/inline-asm-av-constraint.ll (+217)

diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index 3a3a74f323a26..b87de8c87ffd6 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -5580,9 +5580,13 @@ AArch64:
 AMDGPU:
 
 - ``r``: A 32 or 64-bit integer register.
-- ``[0-9]v``: The 32-bit VGPR register, number 0-9.
-- ``[0-9]s``: The 32-bit SGPR register, number 0-9.
-- ``[0-9]a``: The 32-bit AGPR register, number 0-9.
+- ``s``: SGPR register or tuple
+- ``v``: VGPR register or tuple
+- ``a``: AGPR register or tuple. Only valid on gfx908+.
+- ``VA``: VGPR or AGPR register or tuple. Only valid on gfx90a+.
+- ``v[0-9]``: The 32-bit VGPR register, number 0-9.
+- ``s[0-9]``: The 32-bit SGPR register, number 0-9.
+- ``a[0-9]``: The 32-bit AGPR register, number 0-9.
 - ``I``: An integer inline constant in the range from -16 to 64.
 - ``J``: A 16-bit signed integer constant.
 - ``A``: An integer or a floating-point inline constant.
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 5b327fb894504..e4c5ef3039dbb 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -16897,13 +16897,26 @@ SITargetLowering::getRegForInlineAsmConstraint(const TargetRegisterInfo *TRI_,
       }
       break;
     }
-    // We actually support i128, i16 and f16 as inline parameters
-    // even if they are not reported as legal
-    if (RC && (isTypeLegal(VT) || VT.SimpleTy == MVT::i128 ||
-               VT.SimpleTy == MVT::i16 || VT.SimpleTy == MVT::f16))
-      return std::pair(0U, RC);
+  } else if (Constraint == "VA" && Subtarget->hasGFX90AInsts()) {
+    const unsigned BitWidth = VT.getSizeInBits();
+    switch (BitWidth) {
+    case 16:
+      RC = &AMDGPU::AV_32RegClass;
+      break;
+    default:
+      RC = TRI->getVectorSuperClassForBitWidth(BitWidth);
+      if (!RC)
+        return std::pair(0U, nullptr);
+      break;
+    }
   }
 
+  // We actually support i128, i16 and f16 as inline parameters
+  // even if they are not reported as legal
+  if (RC && (isTypeLegal(VT) || VT.SimpleTy == MVT::i128 ||
+             VT.SimpleTy == MVT::i16 || VT.SimpleTy == MVT::f16))
+    return std::pair(0U, RC);
+
   auto [Kind, Idx, NumRegs] = AMDGPU::parseAsmConstraintPhysReg(Constraint);
   if (Kind != '\0') {
     if (Kind == 'v') {
@@ -16988,6 +17001,9 @@ SITargetLowering::getConstraintType(StringRef Constraint) const {
     case 'a':
       return C_RegisterClass;
     }
+  } else if (Constraint.size() == 2) {
+    if (Constraint == "VA")
+      return C_RegisterClass;
   }
   if (isImmConstraint(Constraint)) {
     return C_Other;
diff --git a/llvm/test/CodeGen/AMDGPU/inline-asm-av-constraint-err.ll b/llvm/test/CodeGen/AMDGPU/inline-asm-av-constraint-err.ll
new file mode 100644
index 0000000000000..8b45f0de2300e
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/inline-asm-av-constraint-err.ll
@@ -0,0 +1,27 @@
+; RUN: not llc -mtriple=amdgcn -mcpu=gfx90a -filetype=null %s 2>&1 | FileCheck %s
+
+; Make sure illegal type uses are correctly diagnosed
+
+; CHECK: error: couldn't allocate input reg for constraint 'VA'
+define void @use_A_i8(i8 %x) {
+  call void asm sideeffect "; use $0", "^VA"(i8 %x)
+  ret void
+}
+
+; CHECK: error: couldn't allocate output register for constraint 'VA'
+define i8 @def_A_i8() {
+  %ret = call i8 asm sideeffect "; def $0", "=^VA"()
+  ret i8 %ret
+}
+
+; CHECK: error: couldn't allocate input reg for constraint 'VA'
+define void @use_A_i1(i1 %x) {
+  call void asm sideeffect "; use $0", "^VA"(i1 %x)
+  ret void
+}
+
+; CHECK: error: couldn't allocate output register for constraint 'VA'
+define i1 @def_A_i1() {
+  %ret = call i1 asm sideeffect "; def $0", "=^VA"()
+  ret i1 %ret
+}
diff --git a/llvm/test/CodeGen/AMDGPU/inline-asm-av-constraint.ll b/llvm/test/CodeGen/AMDGPU/inline-asm-av-constraint.ll
new file mode 100644
index 0000000000000..407a802545906
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/inline-asm-av-constraint.ll
@@ -0,0 +1,217 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: not llc -mtriple=amdgcn -mcpu=gfx908 -filetype=null %s 2>&1 | FileCheck -check-prefix=ERR %s
+; RUN: llc -mtriple=amdgcn -mcpu=gfx90a < %s | FileCheck %s
+
+; FIXME: Shouldn't emit and
+; ERR: error: couldn't allocate input reg for constraint 'VA'
+define void @use_A_i16(i16 %x) {
+; CHECK-LABEL: use_A_i16:
+; CHECK:       ; %bb.0:
+; CHECK-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; CHECK-NEXT:    v_and_b32_e32 v0, 0xffff, v0
+; CHECK-NEXT:    ;;#ASMSTART
+; CHECK-NEXT:    ; use v0
+; CHECK-NEXT:    ;;#ASMEND
+; CHECK-NEXT:    s_setpc_b64 s[30:31]
+  call void asm sideeffect "; use $0", "^VA"(i16 %x)
+  ret void
+}
+
+; ERR: error: couldn't allocate input reg for constraint 'VA'
+define void @use_A_f16(half %x) {
+; CHECK-LABEL: use_A_f16:
+; CHECK:       ; %bb.0:
+; CHECK-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; CHECK-NEXT:    ;;#ASMSTART
+; CHECK-NEXT:    ; use v0
+; CHECK-NEXT:    ;;#ASMEND
+; CHECK-NEXT:    s_setpc_b64 s[30:31]
+  call void asm sideeffect "; use $0", "^VA"(half %x)
+  ret void
+}
+
+; ERR: error: couldn't allocate input reg for constraint 'VA'
+define void @use_A_bf16(bfloat %x) {
+; CHECK-LABEL: use_A_bf16:
+; CHECK:       ; %bb.0:
+; CHECK-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; CHECK-NEXT:    ;;#ASMSTART
+; CHECK-NEXT:    ; use v0
+; CHECK-NEXT:    ;;#ASMEND
+; CHECK-NEXT:    s_setpc_b64 s[30:31]
+  call void asm sideeffect "; use $0", "^VA"(bfloat %x)
+  ret void
+}
+
+; ERR: error: couldn't allocate input reg for constraint 'VA'
+define void @use_A_v2i16(<2 x i16> %x) {
+; CHECK-LABEL: use_A_v2i16:
+; CHECK:       ; %bb.0:
+; CHECK-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; CHECK-NEXT:    ;;#ASMSTART
+; CHECK-NEXT:    ; use v0
+; CHECK-NEXT:    ;;#ASMEND
+; CHECK-NEXT:    s_setpc_b64 s[30:31]
+  call void asm sideeffect "; use $0", "^VA"(<2 x i16> %x)
+  ret void
+}
+
+; ERR: error: couldn't allocate input reg for constraint 'VA'
+define void @use_A_v2f16(<2 x half> %x) {
+; CHECK-LABEL: use_A_v2f16:
+; CHECK:       ; %bb.0:
+; CHECK-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; CHECK-NEXT:    ;;#ASMSTART
+; CHECK-NEXT:    ; use v0
+; CHECK-NEXT:    ;;#ASMEND
+; CHECK-NEXT:    s_setpc_b64 s[30:31]
+  call void asm sideeffect "; use $0", "^VA"(<2 x half> %x)
+  ret void
+}
+
+; ERR: error: couldn't allocate input reg for constraint 'VA'
+define void @use_A_v2bf16(<2 x bfloat> %x) {
+; CHECK-LABEL: use_A_v2bf16:
+; CHECK:       ; %bb.0:
+; CHECK-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; CHECK-NEXT:    ;;#ASMSTART
+; CHECK-NEXT:    ; use v0
+; CHECK-NEXT:    ;;#ASMEND
+; CHECK-NEXT:    s_setpc_b64 s[30:31]
+  call void asm sideeffect "; use $0", "^VA"(<2 x bfloat> %x)
+  ret void
+}
+
+; ERR: error: couldn't allocate input reg for constraint 'VA'
+define void @use_A_i32(i32 %x) {
+; CHECK-LABEL: use_A_i32:
+; CHECK:       ; %bb.0:
+; CHECK-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; CHECK-NEXT:    ;;#ASMSTART
+; CHECK-NEXT:    ; use v0
+; CHECK-NEXT:    ;;#ASMEND
+; CHECK-NEXT:    s_setpc_b64 s[30:31]
+  call void asm sideeffect "; use $0", "^VA"(i32 %x)
+  ret void
+}
+
+; ERR: error: couldn't allocate input reg for constraint 'VA'
+define void @use_A_f32(float %x) {
+; CHECK-LABEL: use_A_f32:
+; CHECK:       ; %bb.0:
+; CHECK-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; CHECK-NEXT:    ;;#ASMSTART
+; CHECK-NEXT:    ; use v0
+; CHECK-NEXT:    ;;#ASMEND
+; CHECK-NEXT:    s_setpc_b64 s[30:31]
+  call void asm sideeffect "; use $0", "^VA"(float %x)
+  ret void
+}
+
+; ERR: error: couldn't allocate input reg for constraint 'VA'
+define void @use_A_i64(i64 %x) {
+; CHECK-LABEL: use_A_i64:
+; CHECK:       ; %bb.0:
+; CHECK-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; CHECK-NEXT:    ;;#ASMSTART
+; CHECK-NEXT:    ; use v[0:1]
+; CHECK-NEXT:    ;;#ASMEND
+; CHECK-NEXT:    s_setpc_b64 s[30:31]
+  call void asm sideeffect "; use $0", "^VA"(i64 %x)
+  ret void
+}
+
+; ERR: error: couldn't allocate input reg for constraint 'VA'
+define void @use_A_f64(double %x) {
+; CHECK-LABEL: use_A_f64:
+; CHECK:       ; %bb.0:
+; CHECK-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; CHECK-NEXT:    ;;#ASMSTART
+; CHECK-NEXT:    ; use v[0:1]
+; CHECK-NEXT:    ;;#ASMEND
+; CHECK-NEXT:    s_setpc_b64 s[30:31]
+  call void asm sideeffect "; use $0", "^VA"(double %x)
+  ret void
+}
+
+; ERR: error: couldn't allocate input reg for constraint 'VA'
+define void @use_A_p1(ptr addrspace(1) %x) {
+; CHECK-LABEL: use_A_p1:
+; CHECK:       ; %bb.0:
+; CHECK-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; CHECK-NEXT:    ;;#ASMSTART
+; CHECK-NEXT:    ; use v[0:1]
+; CHECK-NEXT:    ;;#ASMEND
+; CHECK-NEXT:    s_setpc_b64 s[30:31]
+  call void asm sideeffect "; use $0", "^VA"(ptr addrspace(1) %x)
+  ret void
+}
+
+; ERR: error: couldn't allocate input reg for constraint 'VA'
+define void @use_A_v32i32(<32 x i32> %x) {
+; CHECK-LABEL: use_A_v32i32:
+; CHECK:       ; %bb.0:
+; CHECK-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; CHECK-NEXT:    buffer_load_dword v31, off, s[0:3], s32
+; CHECK-NEXT:    s_waitcnt vmcnt(0)
+; CHECK-NEXT:    ;;#ASMSTART
+; CHECK-NEXT:    ; use v[0:31]
+; CHECK-NEXT:    ;;#ASMEND
+; CHECK-NEXT:    s_setpc_b64 s[30:31]
+  call void asm sideeffect "; use $0", "^VA"(<32 x i32> %x)
+  ret void
+}
+
+; ERR: error: couldn't allocate input reg for constraint 'VA'
+define void @use_A_v32f32(<32 x float> %x) {
+; CHECK-LABEL: use_A_v32f32:
+; CHECK:       ; %bb.0:
+; CHECK-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; CHECK-NEXT:    buffer_load_dword v31, off, s[0:3], s32
+; CHECK-NEXT:    s_waitcnt vmcnt(0)
+; CHECK-NEXT:    ;;#ASMSTART
+; CHECK-NEXT:    ; use v[0:31]
+; CHECK-NEXT:    ;;#ASMEND
+; CHECK-NEXT:    s_setpc_b64 s[30:31]
+  call void asm sideeffect "; use $0", "^VA"(<32 x float> %x)
+  ret void
+}
+
+; ERR: error: couldn't allocate output register for constraint 'VA'
+define i16 @def_A_i16() {
+; CHECK-LABEL: def_A_i16:
+; CHECK:       ; %bb.0:
+; CHECK-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; CHECK-NEXT:    ;;#ASMSTART
+; CHECK-NEXT:    ; def v0
+; CHECK-NEXT:    ;;#ASMEND
+; CHECK-NEXT:    s_setpc_b64 s[30:31]
+  %ret = call i16 asm sideeffect "; def $0", "=^VA"()
+  ret i16 %ret
+}
+
+; ERR: error: couldn't allocate output register for constraint 'VA'
+define i32 @def_A_i32() {
+; CHECK-LABEL: def_A_i32:
+; CHECK:       ; %bb.0:
+; CHECK-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; CHECK-NEXT:    ;;#ASMSTART
+; CHECK-NEXT:    ; def v0
+; CHECK-NEXT:    ;;#ASMEND
+; CHECK-NEXT:    s_setpc_b64 s[30:31]
+  %ret = call i32 asm sideeffect "; def $0", "=^VA"()
+  ret i32 %ret
+}
+
+; ERR: error: couldn't allocate output register for constraint 'VA'
+define ptr addrspace(1) @def_A_p1() {
+; CHECK-LABEL: def_A_p1:
+; CHECK:       ; %bb.0:
+; CHECK-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; CHECK-NEXT:    ;;#ASMSTART
+; CHECK-NEXT:    ; def v[0:1]
+; CHECK-NEXT:    ;;#ASMEND
+; CHECK-NEXT:    s_setpc_b64 s[30:31]
+  %ret = call ptr addrspace(1) asm sideeffect "; def $0", "=^VA"()
+  ret ptr addrspace(1) %ret
+}

rampitec · 2025-08-08T08:50:55Z

I actually see the problem with it: the only context it is useful is MFMA. But register class is tied between C and D. If you use this new constraint there RA may allocate v for one and a for another, which does not work. That is not saying that mfma shall never be an inline asm, there is no reason. The other context is a load feeding mfma, or a store out of it. This is even more troubling to use inline asm on memory ops.

arsenm · 2025-08-08T08:59:20Z

I actually see the problem with it: the only context it is useful is MFMA. But register class is tied between C and D. If you use this new constraint there RA may allocate v for one and a for another, which does not work.

It's also useful for load and store. There is no problem, you simply cannot use this constraint for srcC and dst for an MFMA. You are always required to match the operand constraint to the actual instruction you are using, and the instruction encoding does not have an AV register in these operands.

You can use it for srcA and srcB. If you try to misuse this with the wrong operand, you will fail in the assembler (just as you would if you incorrectly specified a VGPR for an operand that requires an SGPR).

rampitec · 2025-08-08T16:33:36Z

You can use it for srcA and srcB. If you try to misuse this with the wrong operand, you will fail in the assembler (just as you would if you incorrectly specified a VGPR for an operand that requires an SGPR).

It's a little different from misuse of a constraint in another cases. Using 'a' where only 'v' is allowed is an user error and fill fail at asm. But using 'av' for C and D of MFMA is not an error, yet RA may allocate registers of different types for both and it will also fail in asm.

Why is this needed in the first place? For example we do not have a 'vs' constraint.

arsenm · 2025-08-11T02:14:12Z

It's a little different from misuse of a constraint in another cases. Using 'a' where only 'v' is allowed is a user error and fill fail at asm. But using 'av' for C and D of MFMA is not an error, yet RA may allocate registers of different types for both and it will also fail in asm.

That is invalid usage of the constraint. It is not applicable to srcC and vdst, and trying to ever use it is user error. You must always use a constraint corresponding to the asm you wrote. It's equivalently invalid to trying to use a 'v' constraint with a scalar input, it just happens that this will often work in this case. This constraint exactly matches the behavior of srcA, srcB, and load/store source and result. In no other context is it correct to use. The assembler will fail if the constraint is abused this way, which is fine. That is a user issue, all of these kinds of issues are deferred assembler errors.

Why is this needed in the first place?

Because I need to write tests relevant to AV allocation and writing one that produces a split in just the right way is impossibly difficult. I have to drop to frail MIR tests. We also probably should have operand constraints that exactly match the set of encodable values.

For example we do not have a 'vs' constraint.

That's essentially how 'r' is treated (although we should probably turn that one into an alias for 'v'). In the normal VALU instruction cases between s and v, you have to worry about violating the constant bus restrictions inside the instruction we can't possibly deal with in the compiler, and the user also cannot deal with. That hazard does not exist in the A+V operand cases, you just need to be consistent with the actual instruction usage

rampitec · 2025-08-11T19:10:30Z

That is invalid usage of the constraint.

OK, that makes sense, although opens up a way for user mistakes.

arsenm added the backend:AMDGPU label Aug 8, 2025 — with Graphite App

arsenm requested review from bcahoon, jrbyrnes, srpande, jayfoad, jwanggit86, kosarev and rampitec August 8, 2025 08:38

arsenm marked this pull request as ready for review August 8, 2025 08:39

llvmbot added the llvm:ir label Aug 8, 2025

rampitec approved these changes Aug 11, 2025

View reviewed changes

arsenm merged commit ff53086 into main Aug 12, 2025
15 checks passed

arsenm deleted the users/arsenm/amdgpu/add-VA-inline-asm-constraint-av-class branch August 12, 2025 01:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

AMDGPU: Add new VA inline asm constraint for AV registers #152665

AMDGPU: Add new VA inline asm constraint for AV registers #152665

Uh oh!

arsenm commented Aug 8, 2025

Uh oh!

arsenm commented Aug 8, 2025

Uh oh!

llvmbot commented Aug 8, 2025 •

edited

Loading

Uh oh!

rampitec commented Aug 8, 2025

Uh oh!

arsenm commented Aug 8, 2025

Uh oh!

rampitec commented Aug 8, 2025

Uh oh!

arsenm commented Aug 11, 2025

Uh oh!

rampitec commented Aug 11, 2025

Uh oh!

Uh oh!

Uh oh!

AMDGPU: Add new VA inline asm constraint for AV registers #152665

AMDGPU: Add new VA inline asm constraint for AV registers #152665

Uh oh!

Conversation

arsenm commented Aug 8, 2025

Uh oh!

arsenm commented Aug 8, 2025

Uh oh!

llvmbot commented Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rampitec commented Aug 8, 2025

Uh oh!

arsenm commented Aug 8, 2025

Uh oh!

rampitec commented Aug 8, 2025

Uh oh!

arsenm commented Aug 11, 2025

Uh oh!

rampitec commented Aug 11, 2025

Uh oh!

Uh oh!

Uh oh!

llvmbot commented Aug 8, 2025 •

edited

Loading