Skip to content

Commit 1b3bff7

Browse files
middle-end: rework vectorizable_store to iterate over single index [PR117557]
The testcase #include <stdint.h> #include <string.h> #define N 8 #define L 8 void f(const uint8_t * restrict seq1, const uint8_t *idx, uint8_t *seq_out) { for (int i = 0; i < L; ++i) { uint8_t h = idx[i]; memcpy((void *)&seq_out[i * N], (const void *)&seq1[h * N / 2], N / 2); } } compiled at -O3 -mcpu=neoverse-n1+sve miscompiles to: ld1w z31.s, p3/z, [x23, z29.s, sxtw] ld1w z29.s, p7/z, [x23, z30.s, sxtw] st1w z29.s, p7, [x24, z12.s, sxtw] st1w z31.s, p7, [x24, z12.s, sxtw] rather than ld1w z31.s, p3/z, [x23, z29.s, sxtw] ld1w z29.s, p7/z, [x23, z30.s, sxtw] st1w z29.s, p7, [x24, z12.s, sxtw] addvl x3, x24, #2 st1w z31.s, p3, [x3, z12.s, sxtw] Where two things go wrong, the wrong mask is used and the address pointers to the stores are wrong. This issue is happening because the codegen loop in vectorizable_store is a nested loop where in the outer loop we iterate over ncopies and in the inner loop we loop over vec_num. For SLP ncopies == 1 and vec_num == SLP_NUM_STMS, but the loop mask is determined by only the outerloop index and the pointer address is only updated in the outer loop. As such for SLP we always use the same predicate and the same memory location. This patch flattens the two loops and instead iterates over ncopies * vec_num and simplified the indexing. This does not fully fix the gcc_r miscompile error in SPECCPU 2017 as the error moves somewhere else. I will look at that next but fixes some other libraries that also started failing. gcc/ChangeLog: PR tree-optimization/117557 * tree-vect-stmts.cc (vectorizable_store): Flatten the ncopies and vec_num loops. gcc/testsuite/ChangeLog: PR tree-optimization/117557 * gcc.target/aarch64/pr117557.c: New test.
1 parent aa9f12e commit 1b3bff7

File tree

2 files changed

+281
-252
lines changed

2 files changed

+281
-252
lines changed
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
/* { dg-do compile } */
2+
/* { dg-options "-O3 -mcpu=neoverse-n1+sve -fdump-tree-vect" } */
3+
/* { dg-final { check-function-bodies "**" "" } } */
4+
5+
#include <stdint.h>
6+
#include <string.h>
7+
8+
#define N 8
9+
#define L 8
10+
11+
/*
12+
**f:
13+
** ...
14+
** ld1w z[0-9]+.s, p([0-9]+)/z, \[x[0-9]+, z[0-9]+.s, sxtw\]
15+
** ld1w z[0-9]+.s, p([0-9]+)/z, \[x[0-9]+, z[0-9]+.s, sxtw\]
16+
** st1w z[0-9]+.s, p\1, \[x[0-9]+, z[0-9]+.s, sxtw\]
17+
** incb x([0-9]+), all, mul #2
18+
** st1w z[0-9]+.s, p\2, \[x\3, z[0-9]+.s, sxtw\]
19+
** ret
20+
** ...
21+
*/
22+
void f(const uint8_t * restrict seq1,
23+
const uint8_t *idx, uint8_t *seq_out) {
24+
for (int i = 0; i < L; ++i) {
25+
uint8_t h = idx[i];
26+
memcpy((void *)&seq_out[i * N], (const void *)&seq1[h * N / 2], N / 2);
27+
}
28+
}
29+

0 commit comments

Comments
 (0)