Skip to content

Commit e2cb6a3

Browse files
committed
[AArch64] Tweak truncate costs for some scalable vector types
* We were previously returning an invalid cost when truncating anything to <vscale x 2 x i1>, which is incorrect since we can generate perfectly good code for this. * The costs for truncating legal or unpacked types to predicates seemed overly optimistic. For example, when truncating <vscale x 8 x i16> to <vscale x 8 x i1> we typically do something like and z0.h, z0.h, #0x1 cmpne p0.h, p0/z, z0.h, #0 I guess it might depend upon whether the input value is generated in the same block or not and if we can avoid the inreg zero-extend. However, it feels safe to take the more conservative cost here. * The costs for some truncates such as trunc <vscale x 2 x i32> %a to <vscale x 2 x i16> were 1, whereas in actual fact they are free and no instructions are required. Also, for this trunc <vscale x 8 x i32> %a to <vscale x 8 x i16> it's just a single uzp1 instruction so I reduced the cost to 1. In general, I've added costs for all cases where the destination type is legal or unpacked. One unfortunate side effect of this is the costs for some fixed-width truncates when using SVE now look too optimistic.
1 parent 8408492 commit e2cb6a3

File tree

4 files changed

+84
-67
lines changed

4 files changed

+84
-67
lines changed

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

Lines changed: 33 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -2766,22 +2766,39 @@ InstructionCost AArch64TTIImpl::getCastInstrCost(unsigned Opcode, Type *Dst,
27662766
{ ISD::TRUNCATE, MVT::v16i32, MVT::v16i64, 4}, // 4 x uzp1
27672767

27682768
// Truncations on nxvmiN
2769-
{ ISD::TRUNCATE, MVT::nxv2i1, MVT::nxv2i16, 1 },
2770-
{ ISD::TRUNCATE, MVT::nxv2i1, MVT::nxv2i32, 1 },
2771-
{ ISD::TRUNCATE, MVT::nxv2i1, MVT::nxv2i64, 1 },
2772-
{ ISD::TRUNCATE, MVT::nxv4i1, MVT::nxv4i16, 1 },
2773-
{ ISD::TRUNCATE, MVT::nxv4i1, MVT::nxv4i32, 1 },
2774-
{ ISD::TRUNCATE, MVT::nxv4i1, MVT::nxv4i64, 2 },
2775-
{ ISD::TRUNCATE, MVT::nxv8i1, MVT::nxv8i16, 1 },
2776-
{ ISD::TRUNCATE, MVT::nxv8i1, MVT::nxv8i32, 3 },
2777-
{ ISD::TRUNCATE, MVT::nxv8i1, MVT::nxv8i64, 5 },
2778-
{ ISD::TRUNCATE, MVT::nxv16i1, MVT::nxv16i8, 1 },
2779-
{ ISD::TRUNCATE, MVT::nxv2i16, MVT::nxv2i32, 1 },
2780-
{ ISD::TRUNCATE, MVT::nxv2i32, MVT::nxv2i64, 1 },
2781-
{ ISD::TRUNCATE, MVT::nxv4i16, MVT::nxv4i32, 1 },
2782-
{ ISD::TRUNCATE, MVT::nxv4i32, MVT::nxv4i64, 2 },
2783-
{ ISD::TRUNCATE, MVT::nxv8i16, MVT::nxv8i32, 3 },
2784-
{ ISD::TRUNCATE, MVT::nxv8i32, MVT::nxv8i64, 6 },
2769+
{ ISD::TRUNCATE, MVT::nxv2i1, MVT::nxv2i8, 2 },
2770+
{ ISD::TRUNCATE, MVT::nxv2i1, MVT::nxv2i16, 2 },
2771+
{ ISD::TRUNCATE, MVT::nxv2i1, MVT::nxv2i32, 2 },
2772+
{ ISD::TRUNCATE, MVT::nxv2i1, MVT::nxv2i64, 2 },
2773+
{ ISD::TRUNCATE, MVT::nxv4i1, MVT::nxv4i8, 2 },
2774+
{ ISD::TRUNCATE, MVT::nxv4i1, MVT::nxv4i16, 2 },
2775+
{ ISD::TRUNCATE, MVT::nxv4i1, MVT::nxv4i32, 2 },
2776+
{ ISD::TRUNCATE, MVT::nxv4i1, MVT::nxv4i64, 5 },
2777+
{ ISD::TRUNCATE, MVT::nxv8i1, MVT::nxv8i8, 2 },
2778+
{ ISD::TRUNCATE, MVT::nxv8i1, MVT::nxv8i16, 2 },
2779+
{ ISD::TRUNCATE, MVT::nxv8i1, MVT::nxv8i32, 5 },
2780+
{ ISD::TRUNCATE, MVT::nxv8i1, MVT::nxv8i64, 11 },
2781+
{ ISD::TRUNCATE, MVT::nxv16i1, MVT::nxv16i8, 2 },
2782+
{ ISD::TRUNCATE, MVT::nxv2i8, MVT::nxv2i16, 0 },
2783+
{ ISD::TRUNCATE, MVT::nxv2i8, MVT::nxv2i32, 0 },
2784+
{ ISD::TRUNCATE, MVT::nxv2i8, MVT::nxv2i64, 0 },
2785+
{ ISD::TRUNCATE, MVT::nxv2i16, MVT::nxv2i32, 0 },
2786+
{ ISD::TRUNCATE, MVT::nxv2i16, MVT::nxv2i64, 0 },
2787+
{ ISD::TRUNCATE, MVT::nxv2i32, MVT::nxv2i64, 0 },
2788+
{ ISD::TRUNCATE, MVT::nxv4i8, MVT::nxv4i16, 0 },
2789+
{ ISD::TRUNCATE, MVT::nxv4i8, MVT::nxv4i32, 0 },
2790+
{ ISD::TRUNCATE, MVT::nxv4i8, MVT::nxv4i64, 1 },
2791+
{ ISD::TRUNCATE, MVT::nxv4i16, MVT::nxv4i32, 0 },
2792+
{ ISD::TRUNCATE, MVT::nxv4i16, MVT::nxv4i64, 1 },
2793+
{ ISD::TRUNCATE, MVT::nxv4i32, MVT::nxv4i64, 1 },
2794+
{ ISD::TRUNCATE, MVT::nxv8i8, MVT::nxv8i16, 0 },
2795+
{ ISD::TRUNCATE, MVT::nxv8i8, MVT::nxv8i32, 1 },
2796+
{ ISD::TRUNCATE, MVT::nxv8i8, MVT::nxv8i64, 3 },
2797+
{ ISD::TRUNCATE, MVT::nxv8i16, MVT::nxv8i32, 1 },
2798+
{ ISD::TRUNCATE, MVT::nxv8i16, MVT::nxv8i64, 3 },
2799+
{ ISD::TRUNCATE, MVT::nxv16i8, MVT::nxv16i16, 1 },
2800+
{ ISD::TRUNCATE, MVT::nxv16i8, MVT::nxv16i32, 3 },
2801+
{ ISD::TRUNCATE, MVT::nxv16i8, MVT::nxv16i64, 7 },
27852802

27862803
// The number of shll instructions for the extension.
27872804
{ ISD::SIGN_EXTEND, MVT::v4i64, MVT::v4i16, 3 },

llvm/test/Analysis/CostModel/AArch64/cast.ll

Lines changed: 18 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -629,27 +629,27 @@ define void @trunc() {
629629
; SVE128-NO-NEON-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s2i8i16 = trunc <2 x i16> undef to <2 x i8>
630630
; SVE128-NO-NEON-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s2i8i32 = trunc <2 x i32> undef to <2 x i8>
631631
; SVE128-NO-NEON-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s2i8i64 = trunc <2 x i64> undef to <2 x i8>
632-
; SVE128-NO-NEON-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s2i16i32 = trunc <2 x i32> undef to <2 x i16>
632+
; SVE128-NO-NEON-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s2i16i32 = trunc <2 x i32> undef to <2 x i16>
633633
; SVE128-NO-NEON-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s2i16i64 = trunc <2 x i64> undef to <2 x i16>
634-
; SVE128-NO-NEON-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s2i32i64 = trunc <2 x i64> undef to <2 x i32>
634+
; SVE128-NO-NEON-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s2i32i64 = trunc <2 x i64> undef to <2 x i32>
635635
; SVE128-NO-NEON-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s4i8i16 = trunc <4 x i16> undef to <4 x i8>
636636
; SVE128-NO-NEON-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s4i8i32 = trunc <4 x i32> undef to <4 x i8>
637637
; SVE128-NO-NEON-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s4i8i64 = trunc <4 x i64> undef to <4 x i8>
638-
; SVE128-NO-NEON-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s4i16i32 = trunc <4 x i32> undef to <4 x i16>
638+
; SVE128-NO-NEON-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s4i16i32 = trunc <4 x i32> undef to <4 x i16>
639639
; SVE128-NO-NEON-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s4i16i64 = trunc <4 x i64> undef to <4 x i16>
640-
; SVE128-NO-NEON-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %s4i32i64 = trunc <4 x i64> undef to <4 x i32>
640+
; SVE128-NO-NEON-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s4i32i64 = trunc <4 x i64> undef to <4 x i32>
641641
; SVE128-NO-NEON-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s8i8i16 = trunc <8 x i16> undef to <8 x i8>
642642
; SVE128-NO-NEON-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s8i8i32 = trunc <8 x i32> undef to <8 x i8>
643643
; SVE128-NO-NEON-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s8i8i64 = trunc <8 x i64> undef to <8 x i8>
644-
; SVE128-NO-NEON-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %s8i16i32 = trunc <8 x i32> undef to <8 x i16>
644+
; SVE128-NO-NEON-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s8i16i32 = trunc <8 x i32> undef to <8 x i16>
645645
; SVE128-NO-NEON-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s8i16i64 = trunc <8 x i64> undef to <8 x i16>
646-
; SVE128-NO-NEON-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %s8i32i64 = trunc <8 x i64> undef to <8 x i32>
646+
; SVE128-NO-NEON-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s8i32i64 = trunc <8 x i64> undef to <8 x i32>
647647
; SVE128-NO-NEON-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s16i8i16 = trunc <16 x i16> undef to <16 x i8>
648648
; SVE128-NO-NEON-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s16i8i32 = trunc <16 x i32> undef to <16 x i8>
649649
; SVE128-NO-NEON-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s16i8i64 = trunc <16 x i64> undef to <16 x i8>
650-
; SVE128-NO-NEON-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %s16i16i32 = trunc <16 x i32> undef to <16 x i16>
650+
; SVE128-NO-NEON-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s16i16i32 = trunc <16 x i32> undef to <16 x i16>
651651
; SVE128-NO-NEON-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s16i16i64 = trunc <16 x i64> undef to <16 x i16>
652-
; SVE128-NO-NEON-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %s16i32i64 = trunc <16 x i64> undef to <16 x i32>
652+
; SVE128-NO-NEON-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s16i32i64 = trunc <16 x i64> undef to <16 x i32>
653653
; SVE128-NO-NEON-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
654654
;
655655
; FIXED-MIN-256-LABEL: 'trunc'
@@ -674,19 +674,19 @@ define void @trunc() {
674674
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s4i8i64 = trunc <4 x i64> undef to <4 x i8>
675675
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s4i16i32 = trunc <4 x i32> undef to <4 x i16>
676676
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s4i16i64 = trunc <4 x i64> undef to <4 x i16>
677-
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s4i32i64 = trunc <4 x i64> undef to <4 x i32>
677+
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s4i32i64 = trunc <4 x i64> undef to <4 x i32>
678678
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s8i8i16 = trunc <8 x i16> undef to <8 x i8>
679679
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s8i8i32 = trunc <8 x i32> undef to <8 x i8>
680680
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s8i8i64 = trunc <8 x i64> undef to <8 x i8>
681-
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s8i16i32 = trunc <8 x i32> undef to <8 x i16>
681+
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s8i16i32 = trunc <8 x i32> undef to <8 x i16>
682682
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s8i16i64 = trunc <8 x i64> undef to <8 x i16>
683-
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %s8i32i64 = trunc <8 x i64> undef to <8 x i32>
683+
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s8i32i64 = trunc <8 x i64> undef to <8 x i32>
684684
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s16i8i16 = trunc <16 x i16> undef to <16 x i8>
685685
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s16i8i32 = trunc <16 x i32> undef to <16 x i8>
686686
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s16i8i64 = trunc <16 x i64> undef to <16 x i8>
687-
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %s16i16i32 = trunc <16 x i32> undef to <16 x i16>
687+
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s16i16i32 = trunc <16 x i32> undef to <16 x i16>
688688
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s16i16i64 = trunc <16 x i64> undef to <16 x i16>
689-
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %s16i32i64 = trunc <16 x i64> undef to <16 x i32>
689+
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s16i32i64 = trunc <16 x i64> undef to <16 x i32>
690690
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
691691
;
692692
; FIXED-MIN-2048-LABEL: 'trunc'
@@ -711,19 +711,19 @@ define void @trunc() {
711711
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s4i8i64 = trunc <4 x i64> undef to <4 x i8>
712712
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s4i16i32 = trunc <4 x i32> undef to <4 x i16>
713713
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s4i16i64 = trunc <4 x i64> undef to <4 x i16>
714-
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s4i32i64 = trunc <4 x i64> undef to <4 x i32>
714+
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s4i32i64 = trunc <4 x i64> undef to <4 x i32>
715715
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s8i8i16 = trunc <8 x i16> undef to <8 x i8>
716716
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s8i8i32 = trunc <8 x i32> undef to <8 x i8>
717717
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s8i8i64 = trunc <8 x i64> undef to <8 x i8>
718-
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s8i16i32 = trunc <8 x i32> undef to <8 x i16>
718+
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s8i16i32 = trunc <8 x i32> undef to <8 x i16>
719719
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s8i16i64 = trunc <8 x i64> undef to <8 x i16>
720-
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s8i32i64 = trunc <8 x i64> undef to <8 x i32>
720+
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s8i32i64 = trunc <8 x i64> undef to <8 x i32>
721721
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s16i8i16 = trunc <16 x i16> undef to <16 x i8>
722722
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s16i8i32 = trunc <16 x i32> undef to <16 x i8>
723723
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s16i8i64 = trunc <16 x i64> undef to <16 x i8>
724-
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s16i16i32 = trunc <16 x i32> undef to <16 x i16>
724+
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s16i16i32 = trunc <16 x i32> undef to <16 x i16>
725725
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s16i16i64 = trunc <16 x i64> undef to <16 x i16>
726-
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s16i32i64 = trunc <16 x i64> undef to <16 x i32>
726+
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s16i32i64 = trunc <16 x i64> undef to <16 x i32>
727727
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
728728
;
729729
; FIXED-MAX-LABEL: 'trunc'

0 commit comments

Comments
 (0)