-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[DAG] Failure to fold select(x, sub(x, c), m) -> sub(x, and(c,m)) #66101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@llvm/issue-subscribers-backend-x86
https://godbolt.org/z/a1PczEM8a
If we're selecting a subtracting a non-constant we fold the select into a and: #include <x86intrin.h>
auto masked_select(__m128i a, __m128i b, __m128i x, __m128i y) {
return _mm_blendv_epi8(a, _mm_sub_epi32(a, b), _mm_cmpgt_epi32(x,y));
} masked_select(long long __vector(2), long long __vector(2), long long __vector(2), long long __vector(2)): # @masked_select(long long __vector(2), long long __vector(2), long long __vector(2), long long __vector(2))
pcmpgtd %xmm3, %xmm2
pand %xmm1, %xmm2
psubd %xmm2, %xmm0
retq But for constants this fails, which on x86 can result in a BLENDV instruction, which is never faster than a AND #include <x86intrin.h>
auto masked_select_const(__m128i a, __m128i x, __m128i y) {
__m128i b = _mm_set1_epi32(24);
return _mm_blendv_epi8(a, _mm_sub_epi32(a, b), _mm_cmpgt_epi32(x,y));
} masked_select_const(long long __vector(2), long long __vector(2), long long __vector(2)): # @masked_select_const(long long __vector(2), long long __vector(2), long long __vector(2))
movdqa %xmm0, %xmm3
movdqa .LCPI3_0(%rip), %xmm4 # xmm4 = [4294967272,4294967272,4294967272,4294967272]
paddd %xmm0, %xmm4
pcmpgtd %xmm2, %xmm1
movdqa %xmm1, %xmm0
blendvps %xmm0, %xmm4, %xmm3
movaps %xmm3, %xmm0
retq |
CC @elhewaty |
assign me, please. |
@RKSimon Is there any source I can use to understand DAG internals. |
I'd start by seeing whats the difference between the IR being fed to DAG from masked_select vs masked_select_const - you will probably need to remove a lot of unnecessary bitcasts. Then step through the DAGCombine stages of running llc in a debugger - add breakpoints to the start of visitADD/visitSUB/visitVSELECT and see whats happening. You can also use "llc --debug" (using a debug assertion build) to dump out everything llc has done: https://rust.godbolt.org/z/szYv5G8n9 |
Hello @RKSimon.
Here's what reached so far, I tried to match a pattern in |
Yes, that looks about right - you should use |
Also, you need to sort out argument order (sorry when I reported this I was thinking _mm_blendv_epi8 order not select IR order) |
@RKSimon, I used the following test case:
The following code can't match the
Any hint? |
@RKSimon ping |
Sorry I missed your ping. In many cases DAG will try to fold // select (sext m), (add X, C), X --> (add X, (and C, (sext m))))
if (N1.getOpcode() == ISD::ADD && N1.getOperand(0) == N2 && N1->hasOneUse() &&
DAG.isConstantIntBuildVectorOrConstantInt(N1.getOperand(1)) &&
N0.getScalarValueSizeInBits() == N1.getScalarValueSizeInBits()) {
return DAG.getNode(ISD::ADD, DL, N1.getValueType(), N2,
DAG.getNode(ISD::AND, DL, N0.getValueType(), N1.getOperand(1),
N0));
} Note you need to ensure the N0 condition is the same width as the True/False operands otherwise you might affect targets with predicate mask types (AVX512 etc). |
@elhewaty Do you have a PR (draft or active) anywhere with your work so far? |
https://godbolt.org/z/a1PczEM8a
If we're selecting a subtraction with a non-constant we fold the select into an and:
But for constants this fails, which on x86 can result in a BLENDV instruction, which is never faster than an AND
The text was updated successfully, but these errors were encountered: