Skip to content

AMDGPU: Add new VA inline asm constraint for AV registers #152665

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

arsenm
Copy link
Contributor

@arsenm arsenm commented Aug 8, 2025

Add a new constraint corresponding to the AV_* register classes
for operands which can allocate AGPRs or VGPRs. This applies
to load and stores on gfx90a+, and srcA / srcB for MFMA instructions.

The error emitted on unsupported targets isn't ideal, it is
produced by the register allocator without a rationale, but it is
consistent with the existing errors.

I mostly want this for writing allocation tests.

Add a new constraint corresponding to the AV_* register classes
for operands which can allocate AGPRs or VGPRs. This applies
to load and stores on gfx90a+, and srcA / srcB for MFMA instructions.

The error emitted on unsupported targets isn't ideal, it is
produced by the register allocator without a rationale, but it is
consistent with the existing errors.

I mostly want this for writing allocation tests.
Copy link
Contributor Author

arsenm commented Aug 8, 2025

This stack of pull requests is managed by Graphite. Learn more about stacking.

@llvmbot
Copy link
Member

llvmbot commented Aug 8, 2025

@llvm/pr-subscribers-llvm-ir

@llvm/pr-subscribers-backend-amdgpu

Author: Matt Arsenault (arsenm)

Changes

Add a new constraint corresponding to the AV_* register classes
for operands which can allocate AGPRs or VGPRs. This applies
to load and stores on gfx90a+, and srcA / srcB for MFMA instructions.

The error emitted on unsupported targets isn't ideal, it is
produced by the register allocator without a rationale, but it is
consistent with the existing errors.

I mostly want this for writing allocation tests.


Full diff: https://github.com/llvm/llvm-project/pull/152665.diff

4 Files Affected:

  • (modified) llvm/docs/LangRef.rst (+7-3)
  • (modified) llvm/lib/Target/AMDGPU/SIISelLowering.cpp (+21-5)
  • (added) llvm/test/CodeGen/AMDGPU/inline-asm-av-constraint-err.ll (+27)
  • (added) llvm/test/CodeGen/AMDGPU/inline-asm-av-constraint.ll (+217)
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index 3a3a74f323a26..b87de8c87ffd6 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -5580,9 +5580,13 @@ AArch64:
 AMDGPU:
 
 - ``r``: A 32 or 64-bit integer register.
-- ``[0-9]v``: The 32-bit VGPR register, number 0-9.
-- ``[0-9]s``: The 32-bit SGPR register, number 0-9.
-- ``[0-9]a``: The 32-bit AGPR register, number 0-9.
+- ``s``: SGPR register or tuple
+- ``v``: VGPR register or tuple
+- ``a``: AGPR register or tuple. Only valid on gfx908+.
+- ``VA``: VGPR or AGPR register or tuple. Only valid on gfx90a+.
+- ``v[0-9]``: The 32-bit VGPR register, number 0-9.
+- ``s[0-9]``: The 32-bit SGPR register, number 0-9.
+- ``a[0-9]``: The 32-bit AGPR register, number 0-9.
 - ``I``: An integer inline constant in the range from -16 to 64.
 - ``J``: A 16-bit signed integer constant.
 - ``A``: An integer or a floating-point inline constant.
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 5b327fb894504..e4c5ef3039dbb 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -16897,13 +16897,26 @@ SITargetLowering::getRegForInlineAsmConstraint(const TargetRegisterInfo *TRI_,
       }
       break;
     }
-    // We actually support i128, i16 and f16 as inline parameters
-    // even if they are not reported as legal
-    if (RC && (isTypeLegal(VT) || VT.SimpleTy == MVT::i128 ||
-               VT.SimpleTy == MVT::i16 || VT.SimpleTy == MVT::f16))
-      return std::pair(0U, RC);
+  } else if (Constraint == "VA" && Subtarget->hasGFX90AInsts()) {
+    const unsigned BitWidth = VT.getSizeInBits();
+    switch (BitWidth) {
+    case 16:
+      RC = &AMDGPU::AV_32RegClass;
+      break;
+    default:
+      RC = TRI->getVectorSuperClassForBitWidth(BitWidth);
+      if (!RC)
+        return std::pair(0U, nullptr);
+      break;
+    }
   }
 
+  // We actually support i128, i16 and f16 as inline parameters
+  // even if they are not reported as legal
+  if (RC && (isTypeLegal(VT) || VT.SimpleTy == MVT::i128 ||
+             VT.SimpleTy == MVT::i16 || VT.SimpleTy == MVT::f16))
+    return std::pair(0U, RC);
+
   auto [Kind, Idx, NumRegs] = AMDGPU::parseAsmConstraintPhysReg(Constraint);
   if (Kind != '\0') {
     if (Kind == 'v') {
@@ -16988,6 +17001,9 @@ SITargetLowering::getConstraintType(StringRef Constraint) const {
     case 'a':
       return C_RegisterClass;
     }
+  } else if (Constraint.size() == 2) {
+    if (Constraint == "VA")
+      return C_RegisterClass;
   }
   if (isImmConstraint(Constraint)) {
     return C_Other;
diff --git a/llvm/test/CodeGen/AMDGPU/inline-asm-av-constraint-err.ll b/llvm/test/CodeGen/AMDGPU/inline-asm-av-constraint-err.ll
new file mode 100644
index 0000000000000..8b45f0de2300e
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/inline-asm-av-constraint-err.ll
@@ -0,0 +1,27 @@
+; RUN: not llc -mtriple=amdgcn -mcpu=gfx90a -filetype=null %s 2>&1 | FileCheck %s
+
+; Make sure illegal type uses are correctly diagnosed
+
+; CHECK: error: couldn't allocate input reg for constraint 'VA'
+define void @use_A_i8(i8 %x) {
+  call void asm sideeffect "; use $0", "^VA"(i8 %x)
+  ret void
+}
+
+; CHECK: error: couldn't allocate output register for constraint 'VA'
+define i8 @def_A_i8() {
+  %ret = call i8 asm sideeffect "; def $0", "=^VA"()
+  ret i8 %ret
+}
+
+; CHECK: error: couldn't allocate input reg for constraint 'VA'
+define void @use_A_i1(i1 %x) {
+  call void asm sideeffect "; use $0", "^VA"(i1 %x)
+  ret void
+}
+
+; CHECK: error: couldn't allocate output register for constraint 'VA'
+define i1 @def_A_i1() {
+  %ret = call i1 asm sideeffect "; def $0", "=^VA"()
+  ret i1 %ret
+}
diff --git a/llvm/test/CodeGen/AMDGPU/inline-asm-av-constraint.ll b/llvm/test/CodeGen/AMDGPU/inline-asm-av-constraint.ll
new file mode 100644
index 0000000000000..407a802545906
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/inline-asm-av-constraint.ll
@@ -0,0 +1,217 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: not llc -mtriple=amdgcn -mcpu=gfx908 -filetype=null %s 2>&1 | FileCheck -check-prefix=ERR %s
+; RUN: llc -mtriple=amdgcn -mcpu=gfx90a < %s | FileCheck %s
+
+; FIXME: Shouldn't emit and
+; ERR: error: couldn't allocate input reg for constraint 'VA'
+define void @use_A_i16(i16 %x) {
+; CHECK-LABEL: use_A_i16:
+; CHECK:       ; %bb.0:
+; CHECK-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; CHECK-NEXT:    v_and_b32_e32 v0, 0xffff, v0
+; CHECK-NEXT:    ;;#ASMSTART
+; CHECK-NEXT:    ; use v0
+; CHECK-NEXT:    ;;#ASMEND
+; CHECK-NEXT:    s_setpc_b64 s[30:31]
+  call void asm sideeffect "; use $0", "^VA"(i16 %x)
+  ret void
+}
+
+; ERR: error: couldn't allocate input reg for constraint 'VA'
+define void @use_A_f16(half %x) {
+; CHECK-LABEL: use_A_f16:
+; CHECK:       ; %bb.0:
+; CHECK-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; CHECK-NEXT:    ;;#ASMSTART
+; CHECK-NEXT:    ; use v0
+; CHECK-NEXT:    ;;#ASMEND
+; CHECK-NEXT:    s_setpc_b64 s[30:31]
+  call void asm sideeffect "; use $0", "^VA"(half %x)
+  ret void
+}
+
+; ERR: error: couldn't allocate input reg for constraint 'VA'
+define void @use_A_bf16(bfloat %x) {
+; CHECK-LABEL: use_A_bf16:
+; CHECK:       ; %bb.0:
+; CHECK-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; CHECK-NEXT:    ;;#ASMSTART
+; CHECK-NEXT:    ; use v0
+; CHECK-NEXT:    ;;#ASMEND
+; CHECK-NEXT:    s_setpc_b64 s[30:31]
+  call void asm sideeffect "; use $0", "^VA"(bfloat %x)
+  ret void
+}
+
+; ERR: error: couldn't allocate input reg for constraint 'VA'
+define void @use_A_v2i16(<2 x i16> %x) {
+; CHECK-LABEL: use_A_v2i16:
+; CHECK:       ; %bb.0:
+; CHECK-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; CHECK-NEXT:    ;;#ASMSTART
+; CHECK-NEXT:    ; use v0
+; CHECK-NEXT:    ;;#ASMEND
+; CHECK-NEXT:    s_setpc_b64 s[30:31]
+  call void asm sideeffect "; use $0", "^VA"(<2 x i16> %x)
+  ret void
+}
+
+; ERR: error: couldn't allocate input reg for constraint 'VA'
+define void @use_A_v2f16(<2 x half> %x) {
+; CHECK-LABEL: use_A_v2f16:
+; CHECK:       ; %bb.0:
+; CHECK-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; CHECK-NEXT:    ;;#ASMSTART
+; CHECK-NEXT:    ; use v0
+; CHECK-NEXT:    ;;#ASMEND
+; CHECK-NEXT:    s_setpc_b64 s[30:31]
+  call void asm sideeffect "; use $0", "^VA"(<2 x half> %x)
+  ret void
+}
+
+; ERR: error: couldn't allocate input reg for constraint 'VA'
+define void @use_A_v2bf16(<2 x bfloat> %x) {
+; CHECK-LABEL: use_A_v2bf16:
+; CHECK:       ; %bb.0:
+; CHECK-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; CHECK-NEXT:    ;;#ASMSTART
+; CHECK-NEXT:    ; use v0
+; CHECK-NEXT:    ;;#ASMEND
+; CHECK-NEXT:    s_setpc_b64 s[30:31]
+  call void asm sideeffect "; use $0", "^VA"(<2 x bfloat> %x)
+  ret void
+}
+
+; ERR: error: couldn't allocate input reg for constraint 'VA'
+define void @use_A_i32(i32 %x) {
+; CHECK-LABEL: use_A_i32:
+; CHECK:       ; %bb.0:
+; CHECK-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; CHECK-NEXT:    ;;#ASMSTART
+; CHECK-NEXT:    ; use v0
+; CHECK-NEXT:    ;;#ASMEND
+; CHECK-NEXT:    s_setpc_b64 s[30:31]
+  call void asm sideeffect "; use $0", "^VA"(i32 %x)
+  ret void
+}
+
+; ERR: error: couldn't allocate input reg for constraint 'VA'
+define void @use_A_f32(float %x) {
+; CHECK-LABEL: use_A_f32:
+; CHECK:       ; %bb.0:
+; CHECK-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; CHECK-NEXT:    ;;#ASMSTART
+; CHECK-NEXT:    ; use v0
+; CHECK-NEXT:    ;;#ASMEND
+; CHECK-NEXT:    s_setpc_b64 s[30:31]
+  call void asm sideeffect "; use $0", "^VA"(float %x)
+  ret void
+}
+
+; ERR: error: couldn't allocate input reg for constraint 'VA'
+define void @use_A_i64(i64 %x) {
+; CHECK-LABEL: use_A_i64:
+; CHECK:       ; %bb.0:
+; CHECK-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; CHECK-NEXT:    ;;#ASMSTART
+; CHECK-NEXT:    ; use v[0:1]
+; CHECK-NEXT:    ;;#ASMEND
+; CHECK-NEXT:    s_setpc_b64 s[30:31]
+  call void asm sideeffect "; use $0", "^VA"(i64 %x)
+  ret void
+}
+
+; ERR: error: couldn't allocate input reg for constraint 'VA'
+define void @use_A_f64(double %x) {
+; CHECK-LABEL: use_A_f64:
+; CHECK:       ; %bb.0:
+; CHECK-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; CHECK-NEXT:    ;;#ASMSTART
+; CHECK-NEXT:    ; use v[0:1]
+; CHECK-NEXT:    ;;#ASMEND
+; CHECK-NEXT:    s_setpc_b64 s[30:31]
+  call void asm sideeffect "; use $0", "^VA"(double %x)
+  ret void
+}
+
+; ERR: error: couldn't allocate input reg for constraint 'VA'
+define void @use_A_p1(ptr addrspace(1) %x) {
+; CHECK-LABEL: use_A_p1:
+; CHECK:       ; %bb.0:
+; CHECK-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; CHECK-NEXT:    ;;#ASMSTART
+; CHECK-NEXT:    ; use v[0:1]
+; CHECK-NEXT:    ;;#ASMEND
+; CHECK-NEXT:    s_setpc_b64 s[30:31]
+  call void asm sideeffect "; use $0", "^VA"(ptr addrspace(1) %x)
+  ret void
+}
+
+; ERR: error: couldn't allocate input reg for constraint 'VA'
+define void @use_A_v32i32(<32 x i32> %x) {
+; CHECK-LABEL: use_A_v32i32:
+; CHECK:       ; %bb.0:
+; CHECK-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; CHECK-NEXT:    buffer_load_dword v31, off, s[0:3], s32
+; CHECK-NEXT:    s_waitcnt vmcnt(0)
+; CHECK-NEXT:    ;;#ASMSTART
+; CHECK-NEXT:    ; use v[0:31]
+; CHECK-NEXT:    ;;#ASMEND
+; CHECK-NEXT:    s_setpc_b64 s[30:31]
+  call void asm sideeffect "; use $0", "^VA"(<32 x i32> %x)
+  ret void
+}
+
+; ERR: error: couldn't allocate input reg for constraint 'VA'
+define void @use_A_v32f32(<32 x float> %x) {
+; CHECK-LABEL: use_A_v32f32:
+; CHECK:       ; %bb.0:
+; CHECK-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; CHECK-NEXT:    buffer_load_dword v31, off, s[0:3], s32
+; CHECK-NEXT:    s_waitcnt vmcnt(0)
+; CHECK-NEXT:    ;;#ASMSTART
+; CHECK-NEXT:    ; use v[0:31]
+; CHECK-NEXT:    ;;#ASMEND
+; CHECK-NEXT:    s_setpc_b64 s[30:31]
+  call void asm sideeffect "; use $0", "^VA"(<32 x float> %x)
+  ret void
+}
+
+; ERR: error: couldn't allocate output register for constraint 'VA'
+define i16 @def_A_i16() {
+; CHECK-LABEL: def_A_i16:
+; CHECK:       ; %bb.0:
+; CHECK-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; CHECK-NEXT:    ;;#ASMSTART
+; CHECK-NEXT:    ; def v0
+; CHECK-NEXT:    ;;#ASMEND
+; CHECK-NEXT:    s_setpc_b64 s[30:31]
+  %ret = call i16 asm sideeffect "; def $0", "=^VA"()
+  ret i16 %ret
+}
+
+; ERR: error: couldn't allocate output register for constraint 'VA'
+define i32 @def_A_i32() {
+; CHECK-LABEL: def_A_i32:
+; CHECK:       ; %bb.0:
+; CHECK-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; CHECK-NEXT:    ;;#ASMSTART
+; CHECK-NEXT:    ; def v0
+; CHECK-NEXT:    ;;#ASMEND
+; CHECK-NEXT:    s_setpc_b64 s[30:31]
+  %ret = call i32 asm sideeffect "; def $0", "=^VA"()
+  ret i32 %ret
+}
+
+; ERR: error: couldn't allocate output register for constraint 'VA'
+define ptr addrspace(1) @def_A_p1() {
+; CHECK-LABEL: def_A_p1:
+; CHECK:       ; %bb.0:
+; CHECK-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; CHECK-NEXT:    ;;#ASMSTART
+; CHECK-NEXT:    ; def v[0:1]
+; CHECK-NEXT:    ;;#ASMEND
+; CHECK-NEXT:    s_setpc_b64 s[30:31]
+  %ret = call ptr addrspace(1) asm sideeffect "; def $0", "=^VA"()
+  ret ptr addrspace(1) %ret
+}

@arsenm arsenm marked this pull request as ready for review August 8, 2025 08:39
@llvmbot llvmbot added the llvm:ir label Aug 8, 2025
@rampitec
Copy link
Collaborator

rampitec commented Aug 8, 2025

I actually see the problem with it: the only context it is useful is MFMA. But register class is tied between C and D. If you use this new constraint there RA may allocate v for one and a for another, which does not work. That is not saying that mfma shall never be an inline asm, there is no reason. The other context is a load feeding mfma, or a store out of it. This is even more troubling to use inline asm on memory ops.

@arsenm
Copy link
Contributor Author

arsenm commented Aug 8, 2025

I actually see the problem with it: the only context it is useful is MFMA. But register class is tied between C and D. If you use this new constraint there RA may allocate v for one and a for another, which does not work.

It's also useful for load and store. There is no problem, you simply cannot use this constraint for srcC and dst for an MFMA. You are always required to match the operand constraint to the actual instruction you are using, and the instruction encoding does not have an AV register in these operands.

You can use it for srcA and srcB. If you try to misuse this with the wrong operand, you will fail in the assembler (just as you would if you incorrectly specified a VGPR for an operand that requires an SGPR).

@rampitec
Copy link
Collaborator

rampitec commented Aug 8, 2025

You can use it for srcA and srcB. If you try to misuse this with the wrong operand, you will fail in the assembler (just as you would if you incorrectly specified a VGPR for an operand that requires an SGPR).

It's a little different from misuse of a constraint in another cases. Using 'a' where only 'v' is allowed is an user error and fill fail at asm. But using 'av' for C and D of MFMA is not an error, yet RA may allocate registers of different types for both and it will also fail in asm.

Why is this needed in the first place? For example we do not have a 'vs' constraint.

@arsenm
Copy link
Contributor Author

arsenm commented Aug 11, 2025

It's a little different from misuse of a constraint in another cases. Using 'a' where only 'v' is allowed is a user error and fill fail at asm. But using 'av' for C and D of MFMA is not an error, yet RA may allocate registers of different types for both and it will also fail in asm.

That is invalid usage of the constraint. It is not applicable to srcC and vdst, and trying to ever use it is user error. You must always use a constraint corresponding to the asm you wrote. It's equivalently invalid to trying to use a 'v' constraint with a scalar input, it just happens that this will often work in this case. This constraint exactly matches the behavior of srcA, srcB, and load/store source and result. In no other context is it correct to use. The assembler will fail if the constraint is abused this way, which is fine. That is a user issue, all of these kinds of issues are deferred assembler errors.

Why is this needed in the first place?

Because I need to write tests relevant to AV allocation and writing one that produces a split in just the right way is impossibly difficult. I have to drop to frail MIR tests. We also probably should have operand constraints that exactly match the set of encodable values.

For example we do not have a 'vs' constraint.

That's essentially how 'r' is treated (although we should probably turn that one into an alias for 'v'). In the normal VALU instruction cases between s and v, you have to worry about violating the constant bus restrictions inside the instruction we can't possibly deal with in the compiler, and the user also cannot deal with. That hazard does not exist in the A+V operand cases, you just need to be consistent with the actual instruction usage

@rampitec
Copy link
Collaborator

That is invalid usage of the constraint.

OK, that makes sense, although opens up a way for user mistakes.

@arsenm arsenm merged commit ff53086 into main Aug 12, 2025
15 checks passed
@arsenm arsenm deleted the users/arsenm/amdgpu/add-VA-inline-asm-constraint-av-class branch August 12, 2025 01:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants