[ARM][TableGen][MC] Change the ARM mnemonic operands to be optional for ASM parsing #83436

AlfieRichardsArm · 2024-02-29T15:34:13Z

This changs the way the assembly matcher works for Aarch32 parsing. Previously there was a pile of hacks which dictated whether the CC, CCOut, and VCC operands should be present which de-facto chose if the wide/narrow (or thumb1/thumb2/arm) instruction version were chosen.

This meant much of the TableGen machinery present for the assembly matching was effectively being bypassed and worked around.

This patch makes the CC and CCOut operands optional which allows the ASM matcher operate as it was designed and means we can avoid doing some of the hacks done previously. This also adds the option for the target to allow the prioritizing the smaller instruction encodings as is required for Aarch32.

llvmbot · 2024-02-29T15:34:42Z

@llvm/pr-subscribers-mc

Author: Alfie Richards (AlfieRichardsArm)

Changes

This changs the way the assembly matcher works for Aarch32 parsing. Previously there was a pile of hacks which dictated whether the CC and CCOut operands should be present which de-facto chose if the wide/narrow (or thumb1/thumb2/arm) instruction version were chosen.

This meant much of the TableGen machinery present for the assembly matching was effectively being bypassed and worked around.

This patch makes the CC and CCOut operands optional which allows the ASM matcher operate as it was designed and means we can avoid doing some of the hacks done previously. This also adds the option for the target to allow the prioritizing the smaller instruction encodings as is required for Aarch32.

Patch is 136.53 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/83436.diff

23 Files Affected:

(modified) llvm/include/llvm/Target/Target.td (+15-1)
(modified) llvm/lib/Target/ARM/ARM.td (+1)
(modified) llvm/lib/Target/ARM/ARMInstrFormats.td (+14-3)
(modified) llvm/lib/Target/ARM/AsmParser/ARMAsmParser.cpp (+634-525)
(modified) llvm/test/MC/ARM/arm-branch-errors.s (+2-2)
(modified) llvm/test/MC/ARM/arm-reg-addr-errors.s (+8-8)
(modified) llvm/test/MC/ARM/arm11-hint-instr.s (+7-1)
(modified) llvm/test/MC/ARM/cps.s (+1-1)
(modified) llvm/test/MC/ARM/d16.s (+1-1)
(modified) llvm/test/MC/ARM/directive-arch_extension-crypto.s (+10-5)
(modified) llvm/test/MC/ARM/lsl-zero-errors.s (+4-4)
(modified) llvm/test/MC/ARM/mve-misc.s (+29-26)
(modified) llvm/test/MC/ARM/negative-immediates-fail.s (+5-5)
(modified) llvm/test/MC/ARM/not-armv4.s (+11-3)
(modified) llvm/test/MC/ARM/register-token-source-loc.s (+3)
(modified) llvm/test/MC/ARM/tMOVSr.s (+3-2)
(modified) llvm/test/MC/ARM/thumb-diagnostics.s (+19-14)
(modified) llvm/test/MC/ARM/thumb-mov.s (+12-4)
(modified) llvm/test/MC/ARM/thumb2-diagnostics.s (+3-1)
(modified) llvm/utils/TableGen/AsmMatcherEmitter.cpp (+54-11)
(modified) llvm/utils/TableGen/CodeGenTarget.cpp (+4)
(modified) llvm/utils/TableGen/CodeGenTarget.h (+5)
(modified) llvm/utils/TableGen/PseudoLoweringEmitter.cpp (+20-17)

diff --git a/llvm/include/llvm/Target/Target.td b/llvm/include/llvm/Target/Target.td
index 0d97a47190b196..8448b768d6e7d6 100644
--- a/llvm/include/llvm/Target/Target.td
+++ b/llvm/include/llvm/Target/Target.td
@@ -948,9 +948,18 @@ class AsmOperandClass {
   /// error will be suppressed if all of the remaining unmatched operands are
   /// marked as IsOptional.
   ///
-  /// Optional arguments must be at the end of the operand list.
+  /// Note: Optional arguments have caveats if they are not at the end of this list
+  /// when regarding custom operand parsing. See below
   bit IsOptional = false;
 
+  // Fixme: Ideally this would not be necessary however this would involve interleaving the
+  // parsing and matching processes.
+  /// Set to 1 if the parser should assume this operand will always be present
+  /// for the sake of calculating the operand index in regards to which custom operand
+  /// parser should be used.
+  /// This is only used for custom operands that are not at the end of the instruction.
+  bit OptionalShouldOffsetCustomParsers = true;
+
   /// The name of the method on the target specific asm parser that returns the
   /// default operand for this optional operand. This method is only used if
   /// IsOptional == 1. If not set, this will default to "defaultFooOperands",
@@ -1724,6 +1733,11 @@ class Target {
   // setting hasExtraDefRegAllocReq and hasExtraSrcRegAllocReq to 1
   // for all opcodes if this flag is set to 0.
   int AllowRegisterRenaming = 0;
+
+  // SortBySize = Should the assembly matcher prefer the smaller
+  // instructions. 1 if the instruction set should sort by size,
+  // 0 otherwise.
+  int SortBySize = 0;
 }
 
 //===----------------------------------------------------------------------===//
diff --git a/llvm/lib/Target/ARM/ARM.td b/llvm/lib/Target/ARM/ARM.td
index 877781568307dc..f380a8e40337b7 100644
--- a/llvm/lib/Target/ARM/ARM.td
+++ b/llvm/lib/Target/ARM/ARM.td
@@ -1746,4 +1746,5 @@ def ARM : Target {
   let AssemblyParsers = [ARMAsmParser];
   let AssemblyParserVariants = [ARMAsmParserVariant];
   let AllowRegisterRenaming = 1;
+  let SortBySize = 1;
 }
diff --git a/llvm/lib/Target/ARM/ARMInstrFormats.td b/llvm/lib/Target/ARM/ARMInstrFormats.td
index 14e315534570d2..6e0466f1bc11bd 100644
--- a/llvm/lib/Target/ARM/ARMInstrFormats.td
+++ b/llvm/lib/Target/ARM/ARMInstrFormats.td
@@ -155,7 +155,13 @@ def iflags_op : Operand<i32> {
 
 // ARM Predicate operand. Default to 14 = always (AL). Second part is CC
 // register whose default is 0 (no register).
-def CondCodeOperand : AsmOperandClass { let Name = "CondCode"; }
+def CondCodeOperand : AsmOperandClass {
+  let Name = "CondCode";
+  let PredicateMethod = "isCondCode";
+  let DefaultMethod = "defaultCondCodeOp";
+  let IsOptional = true;
+  let OptionalShouldOffsetCustomParsers = false;
+}
 def pred : PredicateOperand<OtherVT, (ops i32imm, i32imm),
                                      (ops (i32 14), (i32 zero_reg))> {
   let PrintMethod = "printPredicateOperand";
@@ -174,7 +180,12 @@ def cmovpred : Operand<i32>, PredicateOp,
 }
 
 // Conditional code result for instructions whose 's' bit is set, e.g. subs.
-def CCOutOperand : AsmOperandClass { let Name = "CCOut"; }
+def CCOutOperand : AsmOperandClass {
+  let Name = "CCOut";
+  let DefaultMethod = "defaultCCOutOp";
+  let IsOptional = true;
+  let OptionalShouldOffsetCustomParsers = false;
+}
 def cc_out : OptionalDefOperand<OtherVT, (ops CCR), (ops (i32 zero_reg))> {
   let EncoderMethod = "getCCOutOpValue";
   let PrintMethod = "printSBitModifierOperand";
@@ -468,7 +479,7 @@ class InstThumb<AddrMode am, int sz, IndexMode im,
 // These are aliases that require C++ handling to convert to the target
 // instruction, while InstAliases can be handled directly by tblgen.
 class AsmPseudoInst<string asm, dag iops, dag oops = (outs)>
-  : InstTemplate<AddrModeNone, 0, IndexModeNone, Pseudo, GenericDomain,
+  : InstTemplate<AddrModeNone, 4, IndexModeNone, Pseudo, GenericDomain,
                  "", NoItinerary> {
   let OutOperandList = oops;
   let InOperandList = iops;
diff --git a/llvm/lib/Target/ARM/AsmParser/ARMAsmParser.cpp b/llvm/lib/Target/ARM/AsmParser/ARMAsmParser.cpp
index 37bfb76a494dee..342afae7b05f67 100644
--- a/llvm/lib/Target/ARM/AsmParser/ARMAsmParser.cpp
+++ b/llvm/lib/Target/ARM/AsmParser/ARMAsmParser.cpp
@@ -18,6 +18,7 @@
 #include "llvm/ADT/APFloat.h"
 #include "llvm/ADT/APInt.h"
 #include "llvm/ADT/STLExtras.h"
+#include "llvm/ADT/SmallBitVector.h"
 #include "llvm/ADT/SmallSet.h"
 #include "llvm/ADT/SmallVector.h"
 #include "llvm/ADT/StringMap.h"
@@ -47,6 +48,7 @@
 #include "llvm/Support/Casting.h"
 #include "llvm/Support/CommandLine.h"
 #include "llvm/Support/Compiler.h"
+#include "llvm/Support/Debug.h"
 #include "llvm/Support/ErrorHandling.h"
 #include "llvm/Support/MathExtras.h"
 #include "llvm/Support/SMLoc.h"
@@ -79,6 +81,7 @@ extern const ARMInstrTable ARMDescs;
 } // end namespace llvm
 
 namespace {
+class ARMOperand;
 
 enum class ImplicitItModeTy { Always, Never, ARMOnly, ThumbOnly };
 
@@ -446,9 +449,10 @@ class ARMAsmParser : public MCTargetAsmParser {
   }
 
   bool validatetLDMRegList(const MCInst &Inst, const OperandVector &Operands,
-                           unsigned ListNo, bool IsARPop = false);
+                           unsigned MnemonicOpsEndInd, unsigned ListIndex,
+                           bool IsARPop = false);
   bool validatetSTMRegList(const MCInst &Inst, const OperandVector &Operands,
-                           unsigned ListNo);
+                           unsigned MnemonicOpsEndInd, unsigned ListIndex);
 
   int tryParseRegister();
   bool tryParseRegisterWithWriteBack(OperandVector &);
@@ -504,20 +508,28 @@ class ARMAsmParser : public MCTargetAsmParser {
   bool parseDirectiveSEHEpilogEnd(SMLoc L);
   bool parseDirectiveSEHCustom(SMLoc L);
 
+  std::unique_ptr<ARMOperand> defaultCondCodeOp();
+  std::unique_ptr<ARMOperand> defaultCCOutOp();
+
   bool isMnemonicVPTPredicable(StringRef Mnemonic, StringRef ExtraToken);
   StringRef splitMnemonic(StringRef Mnemonic, StringRef ExtraToken,
-                          unsigned &PredicationCode,
-                          unsigned &VPTPredicationCode, bool &CarrySetting,
-                          unsigned &ProcessorIMod, StringRef &ITMask);
+                          ARMCC::CondCodes &PredicationCode,
+                          ARMVCC::VPTCodes &VPTPredicationCode,
+                          bool &CarrySetting, unsigned &ProcessorIMod,
+                          StringRef &ITMask);
   void getMnemonicAcceptInfo(StringRef Mnemonic, StringRef ExtraToken,
                              StringRef FullInst, bool &CanAcceptCarrySet,
                              bool &CanAcceptPredicationCode,
                              bool &CanAcceptVPTPredicationCode);
   bool enableArchExtFeature(StringRef Name, SMLoc &ExtLoc);
 
-  void tryConvertingToTwoOperandForm(StringRef Mnemonic, bool CarrySetting,
-                                     OperandVector &Operands);
-  bool CDEConvertDualRegOperand(StringRef Mnemonic, OperandVector &Operands);
+  void tryConvertingToTwoOperandForm(StringRef Mnemonic,
+                                     ARMCC::CondCodes PredicationCode,
+                                     bool CarrySetting, OperandVector &Operands,
+                                     unsigned MnemonicOpsEndInd);
+
+  bool CDEConvertDualRegOperand(StringRef Mnemonic, OperandVector &Operands,
+                                unsigned MnemonicOpsEndInd);
 
   bool isThumb() const {
     // FIXME: Can tablegen auto-generate this?
@@ -657,15 +669,20 @@ class ARMAsmParser : public MCTargetAsmParser {
   void cvtThumbBranches(MCInst &Inst, const OperandVector &);
   void cvtMVEVMOVQtoDReg(MCInst &Inst, const OperandVector &);
 
-  bool validateInstruction(MCInst &Inst, const OperandVector &Ops);
-  bool processInstruction(MCInst &Inst, const OperandVector &Ops, MCStreamer &Out);
-  bool shouldOmitCCOutOperand(StringRef Mnemonic, OperandVector &Operands);
-  bool shouldOmitPredicateOperand(StringRef Mnemonic, OperandVector &Operands);
-  bool shouldOmitVectorPredicateOperand(StringRef Mnemonic, OperandVector &Operands);
+  bool validateInstruction(MCInst &Inst, const OperandVector &Ops,
+                           unsigned MnemonicOpsEndInd);
+  bool processInstruction(MCInst &Inst, const OperandVector &Ops,
+                          unsigned MnemonicOpsEndInd, MCStreamer &Out);
+  bool shouldOmitVectorPredicateOperand(StringRef Mnemonic,
+                                        OperandVector &Operands,
+                                        unsigned MnemonicOpsEndInd);
   bool isITBlockTerminator(MCInst &Inst) const;
-  void fixupGNULDRDAlias(StringRef Mnemonic, OperandVector &Operands);
-  bool validateLDRDSTRD(MCInst &Inst, const OperandVector &Operands,
-                        bool Load, bool ARMMode, bool Writeback);
+
+  void fixupGNULDRDAlias(StringRef Mnemonic, OperandVector &Operands,
+                         unsigned MnemonicOpsEndInd);
+  bool validateLDRDSTRD(MCInst &Inst, const OperandVector &Operands, bool Load,
+                        bool ARMMode, bool Writeback,
+                        unsigned MnemonicOpsEndInd);
 
 public:
   enum ARMMatchResultTy {
@@ -675,6 +692,7 @@ class ARMAsmParser : public MCTargetAsmParser {
     Match_RequiresThumb2,
     Match_RequiresV8,
     Match_RequiresFlagSetting,
+    Match_RequiresDestinationRegisterMatchASourceRegister,
 #define GET_OPERAND_DIAGNOSTIC_TYPES
 #include "ARMGenAsmMatcher.inc"
 
@@ -714,6 +732,9 @@ class ARMAsmParser : public MCTargetAsmParser {
   unsigned validateTargetOperandClass(MCParsedAsmOperand &Op,
                                       unsigned Kind) override;
   unsigned checkTargetMatchPredicate(MCInst &Inst) override;
+  unsigned
+  checkEarlyTargetMatchPredicate(MCInst &Inst,
+                                 const OperandVector &Operands) override;
 
   bool MatchAndEmitInstruction(SMLoc IDLoc, unsigned &Opcode,
                                OperandVector &Operands, MCStreamer &Out,
@@ -4052,6 +4073,66 @@ static MCRegister MatchRegisterName(StringRef Name);
 
 /// }
 
+static bool isDataTypeToken(StringRef Tok) {
+  return Tok == ".8" || Tok == ".16" || Tok == ".32" || Tok == ".64" ||
+         Tok == ".i8" || Tok == ".i16" || Tok == ".i32" || Tok == ".i64" ||
+         Tok == ".u8" || Tok == ".u16" || Tok == ".u32" || Tok == ".u64" ||
+         Tok == ".s8" || Tok == ".s16" || Tok == ".s32" || Tok == ".s64" ||
+         Tok == ".p8" || Tok == ".p16" || Tok == ".f32" || Tok == ".f64" ||
+         Tok == ".f" || Tok == ".d";
+}
+
+static unsigned getMnemonicOpsEndInd(const OperandVector &Operands) {
+  unsigned MnemonicOpsEndInd = 1;
+  // Special case for CPS which has a Mnemonic side token for possibly storing ie/id
+  // variant
+  if (Operands[0]->isToken() &&
+      static_cast<ARMOperand &>(*Operands[0]).getToken() == "cps") {
+    if (Operands.size() > 1 && Operands[1]->isImm() &&
+        static_cast<ARMOperand &>(*Operands[1]).getImm()->getKind() ==
+            llvm::MCExpr::Constant &&
+        (dyn_cast<MCConstantExpr>(
+             static_cast<ARMOperand &>(*Operands[1]).getImm())
+                 ->getValue() == ARM_PROC::IE ||
+         dyn_cast<MCConstantExpr>(
+             static_cast<ARMOperand &>(*Operands[1]).getImm())
+                 ->getValue() == ARM_PROC::ID))
+      ++MnemonicOpsEndInd;
+  }
+
+  // In some circumstances the code code moves to the right
+  bool RHSCondCode = false;
+  while (MnemonicOpsEndInd < Operands.size()) {
+    auto Op = static_cast<ARMOperand &>(*Operands[MnemonicOpsEndInd]);
+    // Special case for it instructions which have a condition code on the RHS
+    if (Op.isITMask()) {
+      RHSCondCode = true;
+      MnemonicOpsEndInd++;
+    }
+    // Special case for it instructions which have a condition code on the RHS
+    else if (Op.isToken() &&
+             (
+                 // There are several special cases not covered by
+                 // isDataTypeToken
+                 Op.getToken() == ".w" || Op.getToken() == ".bf16" ||
+                 Op.getToken() == ".p64" || Op.getToken() == ".f16" ||
+                 isDataTypeToken(Op.getToken()))) {
+      // In the mnemonic operators the cond code must always precede the data type. So we
+      // can now safely assume any subsequent cond code is on the RHS.
+      // As is the cdase for VCMP and VPT.
+      RHSCondCode = true;
+      MnemonicOpsEndInd++;
+    }
+    // Skip all mnemonic operator types
+    else if (Op.isCCOut() || (Op.isCondCode() && !RHSCondCode) ||
+             Op.isVPTPred() || (Op.isToken() && Op.getToken() == ".w"))
+      MnemonicOpsEndInd++;
+    else
+      break;
+  }
+  return MnemonicOpsEndInd;
+}
+
 bool ARMAsmParser::parseRegister(MCRegister &Reg, SMLoc &StartLoc,
                                  SMLoc &EndLoc) {
   const AsmToken &Tok = getParser().getTok();
@@ -5597,37 +5678,86 @@ ParseStatus ARMAsmParser::parseAM3Offset(OperandVector &Operands) {
   return ParseStatus::Success;
 }
 
+// Finds the index of the first CondCode operator, if there is none returns 0
+unsigned findCondCodeInd(const OperandVector &Operands,
+                         unsigned MnemonicOpsEndInd) {
+  for (unsigned I = 1; I < MnemonicOpsEndInd; ++I) {
+    auto Op = static_cast<ARMOperand &>(*Operands[I]);
+    if (Op.isCondCode())
+      return I;
+  }
+  return 0;
+}
+
+unsigned findCCOutInd(const OperandVector &Operands,
+                      unsigned MnemonicOpsEndInd) {
+  for (unsigned I = 1; I < MnemonicOpsEndInd; ++I) {
+    auto Op = static_cast<ARMOperand &>(*Operands[I]);
+    if (Op.isCCOut())
+      return I;
+  }
+  return 0;
+}
+
 /// Convert parsed operands to MCInst.  Needed here because this instruction
 /// only has two register operands, but multiplication is commutative so
 /// assemblers should accept both "mul rD, rN, rD" and "mul rD, rD, rN".
 void ARMAsmParser::cvtThumbMultiply(MCInst &Inst,
                                     const OperandVector &Operands) {
-  ((ARMOperand &)*Operands[3]).addRegOperands(Inst, 1);
-  ((ARMOperand &)*Operands[1]).addCCOutOperands(Inst, 1);
-  // If we have a three-operand form, make sure to set Rn to be the operand
-  // that isn't the same as Rd.
-  unsigned RegOp = 4;
-  if (Operands.size() == 6 &&
-      ((ARMOperand &)*Operands[4]).getReg() ==
-          ((ARMOperand &)*Operands[3]).getReg())
-    RegOp = 5;
-  ((ARMOperand &)*Operands[RegOp]).addRegOperands(Inst, 1);
-  Inst.addOperand(Inst.getOperand(0));
-  ((ARMOperand &)*Operands[2]).addCondCodeOperands(Inst, 2);
+  unsigned MnemonicOpsEndInd = getMnemonicOpsEndInd(Operands);
+  unsigned CondI = findCondCodeInd(Operands, MnemonicOpsEndInd);
+  unsigned CondOutI = findCCOutInd(Operands, MnemonicOpsEndInd);
+
+  // 2 operand form
+  unsigned RegRd = MnemonicOpsEndInd;
+  unsigned RegRn = MnemonicOpsEndInd + 1;
+  unsigned RegRm = MnemonicOpsEndInd;
+
+  if (Operands.size() == MnemonicOpsEndInd + 3) {
+    // If we have a three-operand form, make sure to set Rn to be the operand
+    // that isn't the same as Rd.
+    if (((ARMOperand &)*Operands[RegRd]).getReg() ==
+        ((ARMOperand &)*Operands[MnemonicOpsEndInd + 1]).getReg()) {
+      RegRn = MnemonicOpsEndInd + 2;
+      RegRm = MnemonicOpsEndInd + 1;
+    } else {
+      RegRn = MnemonicOpsEndInd + 1;
+      RegRm = MnemonicOpsEndInd + 2;
+    }
+  }
+
+  // Rd
+  ((ARMOperand &)*Operands[RegRd]).addRegOperands(Inst, 1);
+  // CCOut
+  if (CondOutI != 0) {
+    ((ARMOperand &)*Operands[CondOutI]).addCCOutOperands(Inst, 1);
+  } else {
+    ARMOperand Op = *ARMOperand::CreateCCOut(0, Operands[0]->getEndLoc());
+    Op.addCCOutOperands(Inst, 1);
+  }
+  // Rn
+  ((ARMOperand &)*Operands[RegRn]).addRegOperands(Inst, 1);
+  // Rm
+  ((ARMOperand &)*Operands[RegRm]).addRegOperands(Inst, 1);
+
+  // Cond code
+  if (CondI != 0) {
+    ((ARMOperand &)*Operands[CondI]).addCondCodeOperands(Inst, 2);
+  } else {
+    ARMOperand Op =
+        *ARMOperand::CreateCondCode(llvm::ARMCC::AL, Operands[0]->getEndLoc());
+    Op.addCondCodeOperands(Inst, 2);
+  }
 }
 
 void ARMAsmParser::cvtThumbBranches(MCInst &Inst,
                                     const OperandVector &Operands) {
-  int CondOp = -1, ImmOp = -1;
-  switch(Inst.getOpcode()) {
-    case ARM::tB:
-    case ARM::tBcc:  CondOp = 1; ImmOp = 2; break;
+  unsigned MnemonicOpsEndInd = getMnemonicOpsEndInd(Operands);
+  unsigned CondI = findCondCodeInd(Operands, MnemonicOpsEndInd);
+  unsigned Cond =
+      (CondI == 0 ? ARMCC::AL
+                  : static_cast<ARMOperand &>(*Operands[CondI]).getCondCode());
 
-    case ARM::t2B:
-    case ARM::t2Bcc: CondOp = 1; ImmOp = 3; break;
-
-    default: llvm_unreachable("Unexpected instruction in cvtThumbBranches");
-  }
   // first decide whether or not the branch should be conditional
   // by looking at it's location relative to an IT block
   if(inITBlock()) {
@@ -5638,9 +5768,6 @@ void ARMAsmParser::cvtThumbBranches(MCInst &Inst,
       case ARM::t2Bcc: Inst.setOpcode(ARM::t2B); break;
     }
   } else {
-    // outside IT blocks we can only have unconditional branches with AL
-    // condition code or conditional branches with non-AL condition code
-    unsigned Cond = static_cast<ARMOperand &>(*Operands[CondOp]).getCondCode();
     switch(Inst.getOpcode()) {
       case ARM::tB:
       case ARM::tBcc:
@@ -5657,36 +5784,55 @@ void ARMAsmParser::cvtThumbBranches(MCInst &Inst,
   switch(Inst.getOpcode()) {
     // classify tB as either t2B or t1B based on range of immediate operand
     case ARM::tB: {
-      ARMOperand &op = static_cast<ARMOperand &>(*Operands[ImmOp]);
+      ARMOperand &op = static_cast<ARMOperand &>(*Operands[MnemonicOpsEndInd]);
       if (!op.isSignedOffset<11, 1>() && isThumb() && hasV8MBaseline())
         Inst.setOpcode(ARM::t2B);
       break;
     }
     // classify tBcc as either t2Bcc or t1Bcc based on range of immediate operand
     case ARM::tBcc: {
-      ARMOperand &op = static_cast<ARMOperand &>(*Operands[ImmOp]);
+      ARMOperand &op = static_cast<ARMOperand &>(*Operands[MnemonicOpsEndInd]);
       if (!op.isSignedOffset<8, 1>() && isThumb() && hasV8MBaseline())
         Inst.setOpcode(ARM::t2Bcc);
       break;
     }
   }
-  ((ARMOperand &)*Operands[ImmOp]).addImmOperands(Inst, 1);
-  ((ARMOperand &)*Operands[CondOp]).addCondCodeOperands(Inst, 2);
+  ((ARMOperand &)*Operands[MnemonicOpsEndInd]).addImmOperands(Inst, 1);
+  if (CondI != 0) {
+    ((ARMOperand &)*Operands[CondI]).addCondCodeOperands(Inst, 2);
+  } else {
+    ARMOperand Op =
+        *ARMOperand::CreateCondCode(llvm::ARMCC::AL, Operands[0]->getEndLoc());
+    Op.addCondCodeOperands(Inst, 2);
+  }
 }
 
 void ARMAsmParser::cvtMVEVMOVQtoDReg(
   MCInst &Inst, const OperandVector &Operands) {
 
-  // mnemonic, condition code, Rt, Rt2, Qd, idx, Qd again, idx2
-  assert(Operands.size() == 8);
+  unsigned MnemonicOpsEndInd = getMnemonicOpsEndInd(Operands);
+  unsigned CondI = findCondCodeInd(Operands, MnemonicOpsEndInd);
 
-  ((ARMOperand &)*Operands[2]).addRegOperands(Inst, 1); // Rt
-  ((ARMOperand &)*Operands[3]).addRegOperands(Inst, 1); // Rt2
-  ((ARMOperand &)*Operands[4]).addRegOperands(Inst, 1); // Qd
-  ((ARMOperand &)*Operands[5]).addMVEPairVectorIndexOperands(Inst, 1); // idx
+  // mnemonic, condition code, Rt, Rt2, Qd, idx, Qd again, idx2
+  assert(Operands.size() == MnemonicOpsEndInd + 6);
+
+  ((ARMOperand &)*Operands[MnemonicOpsEndInd]).addRegOperands(Inst, 1); // Rt
+  ((ARMOperand &)*Operands[MnemonicOpsEndInd + 1])
+      .addRegOperands(Inst, 1); // Rt2
+  ((ARMOperand &)*Operands[MnemonicOpsEndInd + 2]).addRegOperands(Inst, 1); // Qd
+  ((ARMOperand &)*Operands[MnemonicOpsEndInd + 3])
+      .addMVEPairVectorIndexOperands(Inst, 1); // idx
   // skip second copy of Qd in Operands[6]
-  ((ARMOperand &)*Operands[7]).addMVEPairVectorIndexOperands(Inst, 1); // idx2
-  ((ARMOperand &)*Operands[1]).addCondCodeOperands(Inst, 2); // condition code
+  ((ARMOperand &)*Operands[MnemonicOpsEndInd + 5])
+      .addMVEPairVectorIndexOperands(Inst, 1); // idx2
+  if (CondI != 0) {
+    ((ARMOperand &)*Operands[CondI])
+        .addCondCodeOperands(Inst, 2); // condition code
+  } else {
+    ARMOperand Op =
+        *ARMOperand::CreateCondCode(ARMCC::AL, Operands[0]->getEndLoc());
+    Op.addCondCodeOperands(Inst, 2);
+  }
 }
 
 /// Parse an ARM memory expression, return false if...
[truncated]

llvmbot · 2024-02-29T15:34:42Z

@llvm/pr-subscribers-backend-arm

Author: Alfie Richards (AlfieRichardsArm)

Changes

This changs the way the assembly matcher works for Aarch32 parsing. Previously there was a pile of hacks which dictated whether the CC and CCOut operands should be present which de-facto chose if the wide/narrow (or thumb1/thumb2/arm) instruction version were chosen.

This meant much of the TableGen machinery present for the assembly matching was effectively being bypassed and worked around.

This patch makes the CC and CCOut operands optional which allows the ASM matcher operate as it was designed and means we can avoid doing some of the hacks done previously. This also adds the option for the target to allow the prioritizing the smaller instruction encodings as is required for Aarch32.

Patch is 136.53 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/83436.diff

23 Files Affected:

(modified) llvm/include/llvm/Target/Target.td (+15-1)
(modified) llvm/lib/Target/ARM/ARM.td (+1)
(modified) llvm/lib/Target/ARM/ARMInstrFormats.td (+14-3)
(modified) llvm/lib/Target/ARM/AsmParser/ARMAsmParser.cpp (+634-525)
(modified) llvm/test/MC/ARM/arm-branch-errors.s (+2-2)
(modified) llvm/test/MC/ARM/arm-reg-addr-errors.s (+8-8)
(modified) llvm/test/MC/ARM/arm11-hint-instr.s (+7-1)
(modified) llvm/test/MC/ARM/cps.s (+1-1)
(modified) llvm/test/MC/ARM/d16.s (+1-1)
(modified) llvm/test/MC/ARM/directive-arch_extension-crypto.s (+10-5)
(modified) llvm/test/MC/ARM/lsl-zero-errors.s (+4-4)
(modified) llvm/test/MC/ARM/mve-misc.s (+29-26)
(modified) llvm/test/MC/ARM/negative-immediates-fail.s (+5-5)
(modified) llvm/test/MC/ARM/not-armv4.s (+11-3)
(modified) llvm/test/MC/ARM/register-token-source-loc.s (+3)
(modified) llvm/test/MC/ARM/tMOVSr.s (+3-2)
(modified) llvm/test/MC/ARM/thumb-diagnostics.s (+19-14)
(modified) llvm/test/MC/ARM/thumb-mov.s (+12-4)
(modified) llvm/test/MC/ARM/thumb2-diagnostics.s (+3-1)
(modified) llvm/utils/TableGen/AsmMatcherEmitter.cpp (+54-11)
(modified) llvm/utils/TableGen/CodeGenTarget.cpp (+4)
(modified) llvm/utils/TableGen/CodeGenTarget.h (+5)
(modified) llvm/utils/TableGen/PseudoLoweringEmitter.cpp (+20-17)

diff --git a/llvm/include/llvm/Target/Target.td b/llvm/include/llvm/Target/Target.td
index 0d97a47190b196..8448b768d6e7d6 100644
--- a/llvm/include/llvm/Target/Target.td
+++ b/llvm/include/llvm/Target/Target.td
@@ -948,9 +948,18 @@ class AsmOperandClass {
   /// error will be suppressed if all of the remaining unmatched operands are
   /// marked as IsOptional.
   ///
-  /// Optional arguments must be at the end of the operand list.
+  /// Note: Optional arguments have caveats if they are not at the end of this list
+  /// when regarding custom operand parsing. See below
   bit IsOptional = false;
 
+  // Fixme: Ideally this would not be necessary however this would involve interleaving the
+  // parsing and matching processes.
+  /// Set to 1 if the parser should assume this operand will always be present
+  /// for the sake of calculating the operand index in regards to which custom operand
+  /// parser should be used.
+  /// This is only used for custom operands that are not at the end of the instruction.
+  bit OptionalShouldOffsetCustomParsers = true;
+
   /// The name of the method on the target specific asm parser that returns the
   /// default operand for this optional operand. This method is only used if
   /// IsOptional == 1. If not set, this will default to "defaultFooOperands",
@@ -1724,6 +1733,11 @@ class Target {
   // setting hasExtraDefRegAllocReq and hasExtraSrcRegAllocReq to 1
   // for all opcodes if this flag is set to 0.
   int AllowRegisterRenaming = 0;
+
+  // SortBySize = Should the assembly matcher prefer the smaller
+  // instructions. 1 if the instruction set should sort by size,
+  // 0 otherwise.
+  int SortBySize = 0;
 }
 
 //===----------------------------------------------------------------------===//
diff --git a/llvm/lib/Target/ARM/ARM.td b/llvm/lib/Target/ARM/ARM.td
index 877781568307dc..f380a8e40337b7 100644
--- a/llvm/lib/Target/ARM/ARM.td
+++ b/llvm/lib/Target/ARM/ARM.td
@@ -1746,4 +1746,5 @@ def ARM : Target {
   let AssemblyParsers = [ARMAsmParser];
   let AssemblyParserVariants = [ARMAsmParserVariant];
   let AllowRegisterRenaming = 1;
+  let SortBySize = 1;
 }
diff --git a/llvm/lib/Target/ARM/ARMInstrFormats.td b/llvm/lib/Target/ARM/ARMInstrFormats.td
index 14e315534570d2..6e0466f1bc11bd 100644
--- a/llvm/lib/Target/ARM/ARMInstrFormats.td
+++ b/llvm/lib/Target/ARM/ARMInstrFormats.td
@@ -155,7 +155,13 @@ def iflags_op : Operand<i32> {
 
 // ARM Predicate operand. Default to 14 = always (AL). Second part is CC
 // register whose default is 0 (no register).
-def CondCodeOperand : AsmOperandClass { let Name = "CondCode"; }
+def CondCodeOperand : AsmOperandClass {
+  let Name = "CondCode";
+  let PredicateMethod = "isCondCode";
+  let DefaultMethod = "defaultCondCodeOp";
+  let IsOptional = true;
+  let OptionalShouldOffsetCustomParsers = false;
+}
 def pred : PredicateOperand<OtherVT, (ops i32imm, i32imm),
                                      (ops (i32 14), (i32 zero_reg))> {
   let PrintMethod = "printPredicateOperand";
@@ -174,7 +180,12 @@ def cmovpred : Operand<i32>, PredicateOp,
 }
 
 // Conditional code result for instructions whose 's' bit is set, e.g. subs.
-def CCOutOperand : AsmOperandClass { let Name = "CCOut"; }
+def CCOutOperand : AsmOperandClass {
+  let Name = "CCOut";
+  let DefaultMethod = "defaultCCOutOp";
+  let IsOptional = true;
+  let OptionalShouldOffsetCustomParsers = false;
+}
 def cc_out : OptionalDefOperand<OtherVT, (ops CCR), (ops (i32 zero_reg))> {
   let EncoderMethod = "getCCOutOpValue";
   let PrintMethod = "printSBitModifierOperand";
@@ -468,7 +479,7 @@ class InstThumb<AddrMode am, int sz, IndexMode im,
 // These are aliases that require C++ handling to convert to the target
 // instruction, while InstAliases can be handled directly by tblgen.
 class AsmPseudoInst<string asm, dag iops, dag oops = (outs)>
-  : InstTemplate<AddrModeNone, 0, IndexModeNone, Pseudo, GenericDomain,
+  : InstTemplate<AddrModeNone, 4, IndexModeNone, Pseudo, GenericDomain,
                  "", NoItinerary> {
   let OutOperandList = oops;
   let InOperandList = iops;
diff --git a/llvm/lib/Target/ARM/AsmParser/ARMAsmParser.cpp b/llvm/lib/Target/ARM/AsmParser/ARMAsmParser.cpp
index 37bfb76a494dee..342afae7b05f67 100644
--- a/llvm/lib/Target/ARM/AsmParser/ARMAsmParser.cpp
+++ b/llvm/lib/Target/ARM/AsmParser/ARMAsmParser.cpp
@@ -18,6 +18,7 @@
 #include "llvm/ADT/APFloat.h"
 #include "llvm/ADT/APInt.h"
 #include "llvm/ADT/STLExtras.h"
+#include "llvm/ADT/SmallBitVector.h"
 #include "llvm/ADT/SmallSet.h"
 #include "llvm/ADT/SmallVector.h"
 #include "llvm/ADT/StringMap.h"
@@ -47,6 +48,7 @@
 #include "llvm/Support/Casting.h"
 #include "llvm/Support/CommandLine.h"
 #include "llvm/Support/Compiler.h"
+#include "llvm/Support/Debug.h"
 #include "llvm/Support/ErrorHandling.h"
 #include "llvm/Support/MathExtras.h"
 #include "llvm/Support/SMLoc.h"
@@ -79,6 +81,7 @@ extern const ARMInstrTable ARMDescs;
 } // end namespace llvm
 
 namespace {
+class ARMOperand;
 
 enum class ImplicitItModeTy { Always, Never, ARMOnly, ThumbOnly };
 
@@ -446,9 +449,10 @@ class ARMAsmParser : public MCTargetAsmParser {
   }
 
   bool validatetLDMRegList(const MCInst &Inst, const OperandVector &Operands,
-                           unsigned ListNo, bool IsARPop = false);
+                           unsigned MnemonicOpsEndInd, unsigned ListIndex,
+                           bool IsARPop = false);
   bool validatetSTMRegList(const MCInst &Inst, const OperandVector &Operands,
-                           unsigned ListNo);
+                           unsigned MnemonicOpsEndInd, unsigned ListIndex);
 
   int tryParseRegister();
   bool tryParseRegisterWithWriteBack(OperandVector &);
@@ -504,20 +508,28 @@ class ARMAsmParser : public MCTargetAsmParser {
   bool parseDirectiveSEHEpilogEnd(SMLoc L);
   bool parseDirectiveSEHCustom(SMLoc L);
 
+  std::unique_ptr<ARMOperand> defaultCondCodeOp();
+  std::unique_ptr<ARMOperand> defaultCCOutOp();
+
   bool isMnemonicVPTPredicable(StringRef Mnemonic, StringRef ExtraToken);
   StringRef splitMnemonic(StringRef Mnemonic, StringRef ExtraToken,
-                          unsigned &PredicationCode,
-                          unsigned &VPTPredicationCode, bool &CarrySetting,
-                          unsigned &ProcessorIMod, StringRef &ITMask);
+                          ARMCC::CondCodes &PredicationCode,
+                          ARMVCC::VPTCodes &VPTPredicationCode,
+                          bool &CarrySetting, unsigned &ProcessorIMod,
+                          StringRef &ITMask);
   void getMnemonicAcceptInfo(StringRef Mnemonic, StringRef ExtraToken,
                              StringRef FullInst, bool &CanAcceptCarrySet,
                              bool &CanAcceptPredicationCode,
                              bool &CanAcceptVPTPredicationCode);
   bool enableArchExtFeature(StringRef Name, SMLoc &ExtLoc);
 
-  void tryConvertingToTwoOperandForm(StringRef Mnemonic, bool CarrySetting,
-                                     OperandVector &Operands);
-  bool CDEConvertDualRegOperand(StringRef Mnemonic, OperandVector &Operands);
+  void tryConvertingToTwoOperandForm(StringRef Mnemonic,
+                                     ARMCC::CondCodes PredicationCode,
+                                     bool CarrySetting, OperandVector &Operands,
+                                     unsigned MnemonicOpsEndInd);
+
+  bool CDEConvertDualRegOperand(StringRef Mnemonic, OperandVector &Operands,
+                                unsigned MnemonicOpsEndInd);
 
   bool isThumb() const {
     // FIXME: Can tablegen auto-generate this?
@@ -657,15 +669,20 @@ class ARMAsmParser : public MCTargetAsmParser {
   void cvtThumbBranches(MCInst &Inst, const OperandVector &);
   void cvtMVEVMOVQtoDReg(MCInst &Inst, const OperandVector &);
 
-  bool validateInstruction(MCInst &Inst, const OperandVector &Ops);
-  bool processInstruction(MCInst &Inst, const OperandVector &Ops, MCStreamer &Out);
-  bool shouldOmitCCOutOperand(StringRef Mnemonic, OperandVector &Operands);
-  bool shouldOmitPredicateOperand(StringRef Mnemonic, OperandVector &Operands);
-  bool shouldOmitVectorPredicateOperand(StringRef Mnemonic, OperandVector &Operands);
+  bool validateInstruction(MCInst &Inst, const OperandVector &Ops,
+                           unsigned MnemonicOpsEndInd);
+  bool processInstruction(MCInst &Inst, const OperandVector &Ops,
+                          unsigned MnemonicOpsEndInd, MCStreamer &Out);
+  bool shouldOmitVectorPredicateOperand(StringRef Mnemonic,
+                                        OperandVector &Operands,
+                                        unsigned MnemonicOpsEndInd);
   bool isITBlockTerminator(MCInst &Inst) const;
-  void fixupGNULDRDAlias(StringRef Mnemonic, OperandVector &Operands);
-  bool validateLDRDSTRD(MCInst &Inst, const OperandVector &Operands,
-                        bool Load, bool ARMMode, bool Writeback);
+
+  void fixupGNULDRDAlias(StringRef Mnemonic, OperandVector &Operands,
+                         unsigned MnemonicOpsEndInd);
+  bool validateLDRDSTRD(MCInst &Inst, const OperandVector &Operands, bool Load,
+                        bool ARMMode, bool Writeback,
+                        unsigned MnemonicOpsEndInd);
 
 public:
   enum ARMMatchResultTy {
@@ -675,6 +692,7 @@ class ARMAsmParser : public MCTargetAsmParser {
     Match_RequiresThumb2,
     Match_RequiresV8,
     Match_RequiresFlagSetting,
+    Match_RequiresDestinationRegisterMatchASourceRegister,
 #define GET_OPERAND_DIAGNOSTIC_TYPES
 #include "ARMGenAsmMatcher.inc"
 
@@ -714,6 +732,9 @@ class ARMAsmParser : public MCTargetAsmParser {
   unsigned validateTargetOperandClass(MCParsedAsmOperand &Op,
                                       unsigned Kind) override;
   unsigned checkTargetMatchPredicate(MCInst &Inst) override;
+  unsigned
+  checkEarlyTargetMatchPredicate(MCInst &Inst,
+                                 const OperandVector &Operands) override;
 
   bool MatchAndEmitInstruction(SMLoc IDLoc, unsigned &Opcode,
                                OperandVector &Operands, MCStreamer &Out,
@@ -4052,6 +4073,66 @@ static MCRegister MatchRegisterName(StringRef Name);
 
 /// }
 
+static bool isDataTypeToken(StringRef Tok) {
+  return Tok == ".8" || Tok == ".16" || Tok == ".32" || Tok == ".64" ||
+         Tok == ".i8" || Tok == ".i16" || Tok == ".i32" || Tok == ".i64" ||
+         Tok == ".u8" || Tok == ".u16" || Tok == ".u32" || Tok == ".u64" ||
+         Tok == ".s8" || Tok == ".s16" || Tok == ".s32" || Tok == ".s64" ||
+         Tok == ".p8" || Tok == ".p16" || Tok == ".f32" || Tok == ".f64" ||
+         Tok == ".f" || Tok == ".d";
+}
+
+static unsigned getMnemonicOpsEndInd(const OperandVector &Operands) {
+  unsigned MnemonicOpsEndInd = 1;
+  // Special case for CPS which has a Mnemonic side token for possibly storing ie/id
+  // variant
+  if (Operands[0]->isToken() &&
+      static_cast<ARMOperand &>(*Operands[0]).getToken() == "cps") {
+    if (Operands.size() > 1 && Operands[1]->isImm() &&
+        static_cast<ARMOperand &>(*Operands[1]).getImm()->getKind() ==
+            llvm::MCExpr::Constant &&
+        (dyn_cast<MCConstantExpr>(
+             static_cast<ARMOperand &>(*Operands[1]).getImm())
+                 ->getValue() == ARM_PROC::IE ||
+         dyn_cast<MCConstantExpr>(
+             static_cast<ARMOperand &>(*Operands[1]).getImm())
+                 ->getValue() == ARM_PROC::ID))
+      ++MnemonicOpsEndInd;
+  }
+
+  // In some circumstances the code code moves to the right
+  bool RHSCondCode = false;
+  while (MnemonicOpsEndInd < Operands.size()) {
+    auto Op = static_cast<ARMOperand &>(*Operands[MnemonicOpsEndInd]);
+    // Special case for it instructions which have a condition code on the RHS
+    if (Op.isITMask()) {
+      RHSCondCode = true;
+      MnemonicOpsEndInd++;
+    }
+    // Special case for it instructions which have a condition code on the RHS
+    else if (Op.isToken() &&
+             (
+                 // There are several special cases not covered by
+                 // isDataTypeToken
+                 Op.getToken() == ".w" || Op.getToken() == ".bf16" ||
+                 Op.getToken() == ".p64" || Op.getToken() == ".f16" ||
+                 isDataTypeToken(Op.getToken()))) {
+      // In the mnemonic operators the cond code must always precede the data type. So we
+      // can now safely assume any subsequent cond code is on the RHS.
+      // As is the cdase for VCMP and VPT.
+      RHSCondCode = true;
+      MnemonicOpsEndInd++;
+    }
+    // Skip all mnemonic operator types
+    else if (Op.isCCOut() || (Op.isCondCode() && !RHSCondCode) ||
+             Op.isVPTPred() || (Op.isToken() && Op.getToken() == ".w"))
+      MnemonicOpsEndInd++;
+    else
+      break;
+  }
+  return MnemonicOpsEndInd;
+}
+
 bool ARMAsmParser::parseRegister(MCRegister &Reg, SMLoc &StartLoc,
                                  SMLoc &EndLoc) {
   const AsmToken &Tok = getParser().getTok();
@@ -5597,37 +5678,86 @@ ParseStatus ARMAsmParser::parseAM3Offset(OperandVector &Operands) {
   return ParseStatus::Success;
 }
 
+// Finds the index of the first CondCode operator, if there is none returns 0
+unsigned findCondCodeInd(const OperandVector &Operands,
+                         unsigned MnemonicOpsEndInd) {
+  for (unsigned I = 1; I < MnemonicOpsEndInd; ++I) {
+    auto Op = static_cast<ARMOperand &>(*Operands[I]);
+    if (Op.isCondCode())
+      return I;
+  }
+  return 0;
+}
+
+unsigned findCCOutInd(const OperandVector &Operands,
+                      unsigned MnemonicOpsEndInd) {
+  for (unsigned I = 1; I < MnemonicOpsEndInd; ++I) {
+    auto Op = static_cast<ARMOperand &>(*Operands[I]);
+    if (Op.isCCOut())
+      return I;
+  }
+  return 0;
+}
+
 /// Convert parsed operands to MCInst.  Needed here because this instruction
 /// only has two register operands, but multiplication is commutative so
 /// assemblers should accept both "mul rD, rN, rD" and "mul rD, rD, rN".
 void ARMAsmParser::cvtThumbMultiply(MCInst &Inst,
                                     const OperandVector &Operands) {
-  ((ARMOperand &)*Operands[3]).addRegOperands(Inst, 1);
-  ((ARMOperand &)*Operands[1]).addCCOutOperands(Inst, 1);
-  // If we have a three-operand form, make sure to set Rn to be the operand
-  // that isn't the same as Rd.
-  unsigned RegOp = 4;
-  if (Operands.size() == 6 &&
-      ((ARMOperand &)*Operands[4]).getReg() ==
-          ((ARMOperand &)*Operands[3]).getReg())
-    RegOp = 5;
-  ((ARMOperand &)*Operands[RegOp]).addRegOperands(Inst, 1);
-  Inst.addOperand(Inst.getOperand(0));
-  ((ARMOperand &)*Operands[2]).addCondCodeOperands(Inst, 2);
+  unsigned MnemonicOpsEndInd = getMnemonicOpsEndInd(Operands);
+  unsigned CondI = findCondCodeInd(Operands, MnemonicOpsEndInd);
+  unsigned CondOutI = findCCOutInd(Operands, MnemonicOpsEndInd);
+
+  // 2 operand form
+  unsigned RegRd = MnemonicOpsEndInd;
+  unsigned RegRn = MnemonicOpsEndInd + 1;
+  unsigned RegRm = MnemonicOpsEndInd;
+
+  if (Operands.size() == MnemonicOpsEndInd + 3) {
+    // If we have a three-operand form, make sure to set Rn to be the operand
+    // that isn't the same as Rd.
+    if (((ARMOperand &)*Operands[RegRd]).getReg() ==
+        ((ARMOperand &)*Operands[MnemonicOpsEndInd + 1]).getReg()) {
+      RegRn = MnemonicOpsEndInd + 2;
+      RegRm = MnemonicOpsEndInd + 1;
+    } else {
+      RegRn = MnemonicOpsEndInd + 1;
+      RegRm = MnemonicOpsEndInd + 2;
+    }
+  }
+
+  // Rd
+  ((ARMOperand &)*Operands[RegRd]).addRegOperands(Inst, 1);
+  // CCOut
+  if (CondOutI != 0) {
+    ((ARMOperand &)*Operands[CondOutI]).addCCOutOperands(Inst, 1);
+  } else {
+    ARMOperand Op = *ARMOperand::CreateCCOut(0, Operands[0]->getEndLoc());
+    Op.addCCOutOperands(Inst, 1);
+  }
+  // Rn
+  ((ARMOperand &)*Operands[RegRn]).addRegOperands(Inst, 1);
+  // Rm
+  ((ARMOperand &)*Operands[RegRm]).addRegOperands(Inst, 1);
+
+  // Cond code
+  if (CondI != 0) {
+    ((ARMOperand &)*Operands[CondI]).addCondCodeOperands(Inst, 2);
+  } else {
+    ARMOperand Op =
+        *ARMOperand::CreateCondCode(llvm::ARMCC::AL, Operands[0]->getEndLoc());
+    Op.addCondCodeOperands(Inst, 2);
+  }
 }
 
 void ARMAsmParser::cvtThumbBranches(MCInst &Inst,
                                     const OperandVector &Operands) {
-  int CondOp = -1, ImmOp = -1;
-  switch(Inst.getOpcode()) {
-    case ARM::tB:
-    case ARM::tBcc:  CondOp = 1; ImmOp = 2; break;
+  unsigned MnemonicOpsEndInd = getMnemonicOpsEndInd(Operands);
+  unsigned CondI = findCondCodeInd(Operands, MnemonicOpsEndInd);
+  unsigned Cond =
+      (CondI == 0 ? ARMCC::AL
+                  : static_cast<ARMOperand &>(*Operands[CondI]).getCondCode());
 
-    case ARM::t2B:
-    case ARM::t2Bcc: CondOp = 1; ImmOp = 3; break;
-
-    default: llvm_unreachable("Unexpected instruction in cvtThumbBranches");
-  }
   // first decide whether or not the branch should be conditional
   // by looking at it's location relative to an IT block
   if(inITBlock()) {
@@ -5638,9 +5768,6 @@ void ARMAsmParser::cvtThumbBranches(MCInst &Inst,
       case ARM::t2Bcc: Inst.setOpcode(ARM::t2B); break;
     }
   } else {
-    // outside IT blocks we can only have unconditional branches with AL
-    // condition code or conditional branches with non-AL condition code
-    unsigned Cond = static_cast<ARMOperand &>(*Operands[CondOp]).getCondCode();
     switch(Inst.getOpcode()) {
       case ARM::tB:
       case ARM::tBcc:
@@ -5657,36 +5784,55 @@ void ARMAsmParser::cvtThumbBranches(MCInst &Inst,
   switch(Inst.getOpcode()) {
     // classify tB as either t2B or t1B based on range of immediate operand
     case ARM::tB: {
-      ARMOperand &op = static_cast<ARMOperand &>(*Operands[ImmOp]);
+      ARMOperand &op = static_cast<ARMOperand &>(*Operands[MnemonicOpsEndInd]);
       if (!op.isSignedOffset<11, 1>() && isThumb() && hasV8MBaseline())
         Inst.setOpcode(ARM::t2B);
       break;
     }
     // classify tBcc as either t2Bcc or t1Bcc based on range of immediate operand
     case ARM::tBcc: {
-      ARMOperand &op = static_cast<ARMOperand &>(*Operands[ImmOp]);
+      ARMOperand &op = static_cast<ARMOperand &>(*Operands[MnemonicOpsEndInd]);
       if (!op.isSignedOffset<8, 1>() && isThumb() && hasV8MBaseline())
         Inst.setOpcode(ARM::t2Bcc);
       break;
     }
   }
-  ((ARMOperand &)*Operands[ImmOp]).addImmOperands(Inst, 1);
-  ((ARMOperand &)*Operands[CondOp]).addCondCodeOperands(Inst, 2);
+  ((ARMOperand &)*Operands[MnemonicOpsEndInd]).addImmOperands(Inst, 1);
+  if (CondI != 0) {
+    ((ARMOperand &)*Operands[CondI]).addCondCodeOperands(Inst, 2);
+  } else {
+    ARMOperand Op =
+        *ARMOperand::CreateCondCode(llvm::ARMCC::AL, Operands[0]->getEndLoc());
+    Op.addCondCodeOperands(Inst, 2);
+  }
 }
 
 void ARMAsmParser::cvtMVEVMOVQtoDReg(
   MCInst &Inst, const OperandVector &Operands) {
 
-  // mnemonic, condition code, Rt, Rt2, Qd, idx, Qd again, idx2
-  assert(Operands.size() == 8);
+  unsigned MnemonicOpsEndInd = getMnemonicOpsEndInd(Operands);
+  unsigned CondI = findCondCodeInd(Operands, MnemonicOpsEndInd);
 
-  ((ARMOperand &)*Operands[2]).addRegOperands(Inst, 1); // Rt
-  ((ARMOperand &)*Operands[3]).addRegOperands(Inst, 1); // Rt2
-  ((ARMOperand &)*Operands[4]).addRegOperands(Inst, 1); // Qd
-  ((ARMOperand &)*Operands[5]).addMVEPairVectorIndexOperands(Inst, 1); // idx
+  // mnemonic, condition code, Rt, Rt2, Qd, idx, Qd again, idx2
+  assert(Operands.size() == MnemonicOpsEndInd + 6);
+
+  ((ARMOperand &)*Operands[MnemonicOpsEndInd]).addRegOperands(Inst, 1); // Rt
+  ((ARMOperand &)*Operands[MnemonicOpsEndInd + 1])
+      .addRegOperands(Inst, 1); // Rt2
+  ((ARMOperand &)*Operands[MnemonicOpsEndInd + 2]).addRegOperands(Inst, 1); // Qd
+  ((ARMOperand &)*Operands[MnemonicOpsEndInd + 3])
+      .addMVEPairVectorIndexOperands(Inst, 1); // idx
   // skip second copy of Qd in Operands[6]
-  ((ARMOperand &)*Operands[7]).addMVEPairVectorIndexOperands(Inst, 1); // idx2
-  ((ARMOperand &)*Operands[1]).addCondCodeOperands(Inst, 2); // condition code
+  ((ARMOperand &)*Operands[MnemonicOpsEndInd + 5])
+      .addMVEPairVectorIndexOperands(Inst, 1); // idx2
+  if (CondI != 0) {
+    ((ARMOperand &)*Operands[CondI])
+        .addCondCodeOperands(Inst, 2); // condition code
+  } else {
+    ARMOperand Op =
+        *ARMOperand::CreateCondCode(ARMCC::AL, Operands[0]->getEndLoc());
+    Op.addCondCodeOperands(Inst, 2);
+  }
 }
 
 /// Parse an ARM memory expression, return false if...
[truncated]

AlfieRichardsArm · 2024-02-29T15:37:54Z

If there is any interest from @llvm/issue-subscribers-backend-mips I noticed the sorting by smallest encoding has effects for the MIPS parsing (thus the inclusion of the option rather than having it on as default to not break tests). I don't know what the correct behavior is for MIPS.

AlfieRichardsArm · 2024-02-29T15:38:22Z

Also note #83413 is a prerequisite for this change.

AlfieRichardsArm · 2024-02-29T15:44:51Z

This change is intended to have minor/no user facing changes. It allows for the removal of some hacks and makes the parsing of ARM more idiomatic with the way TableGen works.

Additionally, it is part of a larger piece of work to address the numerous flaws with how we handle ".w" and ".n".
For more context see:
#14175
https://gist.github.com/rprichard/2a601c3dd1b281f953b4e08b5a9361bb

AlfieRichardsArm · 2024-02-29T16:22:01Z

MCA test failures seem genuine, it makes assumptions about optional operands that this invalidates. Im investigating now

llvm/utils/TableGen/PseudoLoweringEmitter.cpp

llvm/utils/TableGen/AsmMatcherEmitter.cpp

llvm/lib/Target/ARM/AsmParser/ARMAsmParser.cpp

llvm/utils/TableGen/PseudoLoweringEmitter.cpp

llvm/utils/TableGen/AsmMatcherEmitter.cpp

llvm/include/llvm/Target/Target.td

llvm/utils/TableGen/AsmMatcherEmitter.cpp

github-actions · 2024-03-01T15:51:24Z

✅ With the latest revision this PR passed the C/C++ code formatter.

AlfieRichardsArm · 2024-03-01T16:51:04Z

Also, for quite a nice example of improved behavior from this change, see llvm/test/MC/ARM/tMOVSr.s.

In this case, the old behavior was that the de-facto wide/narrow decision meant this was parsed as a wide encoding which was then converted to narrow by a hacky bit of code that this test checks for.

This test is changed by this patch to check that the narrow encoding is used because it was selected by the TableGen matcher and the hack wasn't used (I intend to go through these hacks and remove all those that are now unused after this patch is accepted).

Also see the numerous cases of improved (though more verbose) diagnostics.

llvm/utils/TableGen/AsmMatcherEmitter.cpp

llvm/lib/Target/ARM/ARM.td

AlfieRichardsArm · 2024-03-18T09:22:57Z

I have merged the prerequisites. Then as long as the tests pass and @ostannard doesnt have any changes to make I will merge this today. Thank you @s-barannikov @kosarev @statham-arm for your help with this.

ostannard

This LGTM too.

mstorsjo · 2024-03-19T07:28:53Z

This change broke building FFmpeg for arm targets. Assembling https://github.com/FFmpeg/FFmpeg/blob/master/libavcodec/arm/vp8_armv6.S#L73-L75 produces the following error:

src/libavcodec/arm/vp8_armv6.S:74:27: error: asr operand expected.
        pkhtbne r11, r11, r11, asr #16
                          ^
src/libavcodec/arm/vp8_armv6.S:75:9: error: incorrect condition in IT block; got 'al', but expected 'ne'
        ldr r8, [r0, #16] @ code_word
        ^

It can be reproduced with the following standalone test snippet:

        .syntax unified
        @.thumb

        it ne
        pkhtbne r11, r11, r11, asr #16
        nop

Built like this:

$ clang -target armv7-linux-gnueabihf -c repro.s
repro.s:5:27: error: asr operand expected.
        pkhtbne r11, r11, r11, asr #16
                          ^
repro.s:6:9: error: incorrect condition in IT block; got 'al', but expected 'ne'
        nop
        ^

Please consider reverting this if you can't come up with a fix during the day. (Or I can push a revert myself.)

Also, as a side note, from the merged git commit, I see that you have your email address hidden on github. Apparently the bot didn't make a comment about that for some reason. So, from https://github.com/llvm/llvm-project/blob/main/.github/workflows/email-check.yaml, please turn off Keep my email addresses private setting in your account, and see LLVM Discourse for more information.

AlfieRichardsArm · 2024-03-19T09:01:17Z

Hi @mstorsjo,
I am looking into this now. I have also updated my email settings.

AlfieRichardsArm · 2024-03-19T10:23:27Z

@mstorsjo I have a fix I will add you to shortly. Thank you for finding this.
Do you have the compile command for the whole file? I would like to check that the file now compiles.

This was broken by llvm#83436 as in optional operands meant sometimes the parsePKHImm parser is applied to operands in other positions, which previously produced an error. Now this instead fails the parse. However this unfortunately means the default parsing happens which combines the shift and the base operator which causes the diagnostics to be non-sense. To fix this I had to add a special case diagnostic which is unfortunate. Ideally these two methods for handling operands should be unified and the TableGen matching mechanisms should be used for validaing parsed shifts rather than special cases in parsing.

mstorsjo · 2024-03-19T10:53:11Z

@mstorsjo I have a fix I will add you to shortly. Thank you for finding this. Do you have the compile command for the whole file? I would like to check that the file now compiles.

I tested, the whole file, and the rest of the project, builds fine now - thanks! (It usually requires you to run configure for the project, to generate a suitable config.h for the target, so providing a single command for the full file isn't easy.)

AlfieRichardsArm · 2024-03-19T10:55:55Z

Great. Thanks for finding this!

This was broken by #83436 as in optional operands meant when the CC operand is provided the `parsePKHImm` parser is applied to register operands, which previously erroneously produced an error.

dyung · 2024-03-20T17:43:28Z

Hi @AlfieRichardsArm we are seeing a case with inline asm that no longer compiles after your change. Consider the following code:

void a() {
  __asm("strexd r0, r2, [r3]");
  __asm("ldrexd r0, [r2]");
}

Prior to your change, this compiled successfully:

$ ~/src/upstream/3e6db602918435b6a5ac476f63f8b259e7e73af4-linux/bin/clang --target=armv7a-unknown-eabi -c min2.c

But after your change, the compiler is now emitting errors for the same code:

$ ~/src/upstream/295cdd5c3dbd14406bf9cce01e3dfd787fb1ddda-linux/bin/clang --target=armv7a-unknown-eabi -c min2.c
min2.c:2:9: error: invalid operand for instruction
    2 |   __asm("strexd r0, r2, [r3]");
      |         ^
<inline asm>:1:13: note: instantiated into assembly here
    1 |         strexd r0, r2, [r3]
      |                    ^
min2.c:3:9: error: invalid operand for instruction
    3 |   __asm("ldrexd r0, [r2]");
      |         ^
<inline asm>:1:9: note: instantiated into assembly here
    1 |         ldrexd r0, [r2]
      |                ^
2 errors generated.

Is this expected?

AlfieRichardsArm · 2024-03-20T17:49:29Z

Hi @dyung,
This is likely a regression that there weren't tests for. I will look into this urgently tomorrow.
Alfie

AlfieRichardsArm · 2024-03-21T10:04:21Z

Hi @dyung,
Looking at the architecture reference manual (https://developer.arm.com/documentation/ddi0597/2023-09/?lang=en) I dont recognise these encodings.

Perhaps we previously parsed these but shouldn't have? (For me it causes a crash on old builds?).

In this case they would both need an extra operands. eg.

        strexd  r0, r2, r3, [r3]                @ encoding: [0x90,0x0f,0xa3,0xe1]
        ldrexd  r0, r1, [r2]                    @ encoding: [0x9f,0x0f,0xb2,0xe1]

(I'm guessing at the missing registers)

Is the source for the project which these came from public?

dyung · 2024-03-21T14:42:07Z

Hi @dyung, Looking at the architecture reference manual (https://developer.arm.com/documentation/ddi0597/2023-09/?lang=en) I dont recognise these encodings.

Perhaps we previously parsed these but shouldn't have? (For me it causes a crash on old builds?).

In this case they would both need an extra operands. eg.
        strexd  r0, r2, r3, [r3]                @ encoding: [0x90,0x0f,0xa3,0xe1]
        ldrexd  r0, r1, [r2]                    @ encoding: [0x9f,0x0f,0xb2,0xe1]
(I'm guessing at the missing registers)

Is the source for the project which these came from public?

The code is unfortunately not from a public codebase.

I am told that it is a short form of essentially the longer form you posted. If you try earlier versions of clang and compile directly to an object file and disassemble it, I believe it generated the assembly you mentioned. (I also noticed that earlier versions of clang hit an assertion failure when generating the assembly)

AlfieRichardsArm · 2024-03-21T14:52:05Z

Ah okay, very strange. I cant find any references to this in specifications or docs. Do you have any documentation for it?

I would be happy to add some aliases for this alternate form if it's correct behavior, but I can't seem to find documentation for it or references to it before this code.

smithp35 · 2024-03-21T15:13:47Z

It does look like GCC will accept these alternative forms. I expect that this will have been an unofficial/undocumented bit of GCC behaviour that clang has copied.

AlfieRichardsArm · 2024-03-21T15:19:27Z

I expect that this will have been an unofficial/undocumented bit of GCC behaviour that clang has copied.

@smithp35 Ah okay, in that case shall I re-add this behavior and add some tests?
I dislike the way strexd and ldrexd is handled so could rewrite this to handle this case and have better diagnostics.

As mentioned above, this seems to have been broken previously, but didn't have a test.

smithp35 · 2024-03-21T15:24:13Z

I think adding that behaviour back makes sense as existing implementations make a defacto standard. Ideally we can trace this back and when we add a test we can say it is for compatibility with GCC.

AlfieRichardsArm · 2024-03-21T15:27:09Z

Okay, I will create a ticket for this and pick it up.

By "trace this back" do you mean find a reference to this in GCC? Or find a source for the previous behavior in LLVM?

smithp35 · 2024-03-21T15:32:47Z

Yes, all I've done is just plug the example on GCC. If we're lucky there will be a comment in the source code or commit message for GCC, or clang saying that it has copied GCC.

There will be a limit to what we can do, sometimes there won't be anything written down that can be easily found.

This was broken by llvm#83436 as in optional operands meant when the CC operand is provided the `parsePKHImm` parser is applied to register operands, which previously erroneously produced an error.

AlfieRichardsArm · 2024-03-25T14:26:17Z

@dyung Can you please try #86507 and see if this fixes your issue

…ctions (#86507) These aliases were supported previously there was a regression at some point. This adds back the alternate forms and tidies up this section of code a little. See #83436 (comment) for the initial report regarding this change.

AlfieRichardsArm requested review from kosarev, s-barannikov, statham-arm and dcandler February 29, 2024 15:34

llvmbot added backend:ARM mc Machine (object) code labels Feb 29, 2024

AlfieRichardsArm force-pushed the optional-mnemonic-operands-arm branch from 570ab5f to d2ad78c Compare February 29, 2024 15:35

AlfieRichardsArm changed the title ~~Change the ARM mnemonic operands to be optional for ASM parsing~~ [ARM][TableGen][MC] Change the ARM mnemonic operands to be optional for ASM parsing Feb 29, 2024

s-barannikov reviewed Feb 29, 2024

View reviewed changes

llvm/lib/Target/ARM/AsmParser/ARMAsmParser.cpp Outdated Show resolved Hide resolved

s-barannikov reviewed Feb 29, 2024

View reviewed changes

llvm/lib/Target/ARM/AsmParser/ARMAsmParser.cpp Outdated Show resolved Hide resolved

s-barannikov reviewed Feb 29, 2024

View reviewed changes

llvm/utils/TableGen/PseudoLoweringEmitter.cpp Outdated Show resolved Hide resolved

AlfieRichardsArm mentioned this pull request Mar 1, 2024

[TableGen] Fix minor index bugs in PseudoLoweringEmitter #81605

Open

AlfieRichardsArm force-pushed the optional-mnemonic-operands-arm branch from d2ad78c to a1cd51c Compare March 1, 2024 12:45

kosarev reviewed Mar 1, 2024

View reviewed changes

AlfieRichardsArm force-pushed the optional-mnemonic-operands-arm branch from a1cd51c to 83213d7 Compare March 1, 2024 15:48

AlfieRichardsArm force-pushed the optional-mnemonic-operands-arm branch 4 times, most recently from a46b797 to da3d160 Compare March 1, 2024 16:21

s-barannikov reviewed Mar 2, 2024

View reviewed changes

llvm/utils/TableGen/AsmMatcherEmitter.cpp Outdated Show resolved Hide resolved

s-barannikov reviewed Mar 2, 2024

View reviewed changes

llvm/lib/Target/ARM/ARM.td Outdated Show resolved Hide resolved

ostannard approved these changes Mar 18, 2024

View reviewed changes

AlfieRichardsArm merged commit 295cdd5 into llvm:main Mar 18, 2024

AlfieRichardsArm mentioned this pull request Mar 18, 2024

ODR warning in ARMGenInstrInfo.inc (ARMInstrTable), Arch64GenAsmMatcher.in (SubtargetFeatureBits), MachineScheduler.h (GenericSchedulerBase) #84577

Closed

AlfieRichardsArm mentioned this pull request Mar 19, 2024

[ARM] FIX: change pkhtb custom parsing produce NoMatch rather than Error #85765

Merged

AlfieRichardsArm mentioned this pull request Mar 21, 2024

[ARM] Add reference to ARMAsmParser in ARMOperand #86110

Merged

AlfieRichardsArm mentioned this pull request Mar 25, 2024

[ARM][MC] Add GNU Alias for ldrexd and strexd instructions #86507

Merged

[ARM][TableGen][MC] Change the ARM mnemonic operands to be optional for ASM parsing #83436

[ARM][TableGen][MC] Change the ARM mnemonic operands to be optional for ASM parsing #83436

Uh oh!

Conversation

AlfieRichardsArm commented Feb 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Feb 29, 2024

Uh oh!

llvmbot commented Feb 29, 2024

Uh oh!

AlfieRichardsArm commented Feb 29, 2024

Uh oh!

AlfieRichardsArm commented Feb 29, 2024

Uh oh!

AlfieRichardsArm commented Feb 29, 2024

Uh oh!

AlfieRichardsArm commented Feb 29, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Mar 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AlfieRichardsArm commented Mar 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AlfieRichardsArm commented Mar 18, 2024

Uh oh!

ostannard left a comment

Choose a reason for hiding this comment

Uh oh!

mstorsjo commented Mar 19, 2024

Uh oh!

AlfieRichardsArm commented Mar 19, 2024

Uh oh!

AlfieRichardsArm commented Mar 19, 2024

Uh oh!

mstorsjo commented Mar 19, 2024

Uh oh!

AlfieRichardsArm commented Mar 19, 2024

Uh oh!

dyung commented Mar 20, 2024

Uh oh!

AlfieRichardsArm commented Mar 20, 2024

Uh oh!

AlfieRichardsArm commented Mar 21, 2024

Uh oh!

dyung commented Mar 21, 2024

Uh oh!

AlfieRichardsArm commented Mar 21, 2024

Uh oh!

smithp35 commented Mar 21, 2024

Uh oh!

AlfieRichardsArm commented Mar 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

smithp35 commented Mar 21, 2024

Uh oh!

AlfieRichardsArm commented Mar 21, 2024

Uh oh!

smithp35 commented Mar 21, 2024

Uh oh!

AlfieRichardsArm commented Mar 25, 2024

Uh oh!

Uh oh!

AlfieRichardsArm commented Feb 29, 2024 •

edited

Loading

github-actions bot commented Mar 1, 2024 •

edited

Loading

AlfieRichardsArm commented Mar 1, 2024 •

edited

Loading

AlfieRichardsArm commented Mar 21, 2024 •

edited

Loading