Closed
Description
Bugzilla Link | 46286 |
Version | trunk |
OS | Windows NT |
CC | @rotateright |
Extended Description
For cases such as:
define <4 x float> @fneg_concat_v2f32(<2 x float> %a0, <2 x float> %a1) {
%1 = fneg <2 x float> %a0
%2 = fneg <2 x float> %a1
%3 = shufflevector <2 x float> %1, <2 x float> %2, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
ret <4 x float> %3
}
define <4 x float> @fneg_concat_v4f32(<4 x float> %a0, <4 x float> %a1) {
%1 = fneg <4 x float> %a0
%2 = fneg <4 x float> %a1
%3 = shufflevector <4 x float> %1, <4 x float> %2, <4 x i32> <i32 0, i32 1, i32 4, i32 5>
ret <4 x float> %3
}
we are almost certainly better off moving the fneg after the shuffle:
define <4 x float> @concat_fneg_v2f32(<2 x float> %a0, <2 x float> %a1) {
%1 = shufflevector <2 x float> %a0, <2 x float> %a1, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
%2 = fneg <4 x float> %1
ret <4 x float> %2
}
define <4 x float> @concat_fneg_v4f32(<4 x float> %a0, <4 x float> %a1) {
%1 = shufflevector <4 x float> %a0, <4 x float> %a1, <4 x i32> <i32 0, i32 1, i32 4, i32 5>
%2 = fneg <4 x float> %1
ret <4 x float> %2
}
Binops would probably benefit in some cases (constant operand?) as well.
The issue that vectorcombine might encounter though is that we fail to get costs for most length changing shuffles, so the 'concat_vectors' shuffle pattern returns an 'Unknown' cost.