Closed
Description
test: https://gcc.godbolt.org/z/f86hxd8cT
#define N 480
unsigned int
f (unsigned int res, signed char *restrict a,
unsigned char *restrict b)
{
for (__INTPTR_TYPE__ i = 0; i < N; ++i)
{
int av = a[i];
int bv = b[i];
signed short mult = av * bv;
res += mult;
}
return res;
}
According gcc-12, Armv8.6-A introduced a new dot-product instruction for when the sign of the operands differ called usdot. This instruction is introduced behind the +i8mm compiler flag.
Starting with GCC 12 the auto-vectorizer can now automatically recognize and use this instruction, while llvm can't.