-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[RISCV] Extract subregister if VLEN is known when lowering extract_subvector #65392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This is partly a precommit for an upcoming patch, and partly to remove the fixed length LMUL restriction similarly to what was done in https://reviews.llvm.org/D158270, since it's no longer that relevant.
This patch refactors extract_subvector to lower to extract_subreg directly, and to shortcut whenever the index is 0 when extracting a scalable vector. This doesn't change any of the existing behaviour, but makes an upcoming patch that extends the scalable path slightly easier to read.
…bvector If we know VLEN at compile time, then we can workout what subregister an index into a fixed length vector will be at. We can use this information when lowering extract_subvector to perform the vslidedown on a smaller subregister. This allows us to use a smaller LMUL, or if the extract is aligned to a vector register then we can avoid the slide altogether. The logic here is a bit tangled with the scalable path: If people find this too unwieldy, I can separate it out and duplicate it for the fixed case. This technique could be applied to extract_vector_elt, insert_vector_elt and insert_subvector too.
@@ -152,6 +152,11 @@ class RISCVSubtarget : public RISCVGenSubtargetInfo { | |||
unsigned VLen = getMaxRVVVectorSizeInBits(); | |||
return VLen == 0 ? 65536 : VLen; | |||
} | |||
std::optional<unsigned> getRealKnownVLen() const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest: getExactVLen().
// which register of a LMUL group contains the specific subvector as we only | ||
// know the minimum register size. Therefore we must slide the vector group | ||
// down the full amount. | ||
if (SubVecVT.isFixedLengthVector() && !KnownVLen) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't fully general.
Consider: 8 x i64 on V (a lmul4 type), extract the second <2 x i64> quarter. We can still do know this is the first LMUL2 sub-register. This doesn't allow us to slide less, but it does allow us to reduce the lmul for the slide.
unsigned EffectiveIdx = OrigIdx; | ||
unsigned Vscale = *KnownVLen / RISCV::RVVBitsPerBlock; | ||
if (SubVecVT.isFixedLengthVector()) { | ||
assert(KnownVLen); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assert isn't needed. KnownVLen
was already accessed on line 8670.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC *
on a null optional is just UB, you need to use value()
to have an exception thrown. This part is a bit hairy though, I'm avoiding value()
so I don't have to define Vscale
twice. Any ideas on a better way to structure this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have any specific ideas, but I definitely wouldn't rely on UB. A library might put an assert in *
rather than throwing an exception.
As noted in llvm#65392 (comment), when lowering an extract of a fixed length vector from another vector, we don't need to perform the vslidedown on the full vector type. Instead we can extract the smallest subregister that contains the subvector to be extracted and perform the vslidedown with a smaller LMUL. E.g, with +Zvl128b: v2i64 = extract_subvector nxv4i64, 2 is currently lowered as vsetivli zero, 2, e64, m4, ta, ma vslidedown.vi v8, v8, 2 This patch shrinks the vslidedown to LMUL=2: vsetivli zero, 2, e64, m2, ta, ma vslidedown.vi v8, v8, 2 Because we know that there's at least 128*2=256 bits in v8 at LMUL=2, and we only need the first 256 bits to extract a v2i64 at index 2. lowerEXTRACT_VECTOR_ELT already has this logic, so this extracts it out and reuses it. I've split this out into a separate PR rather than include it in llvm#65392, with the hope that we'll be able to generalize it later.
As noted in #65392 (comment), when lowering an extract of a fixed length vector from another vector, we don't need to perform the vslidedown on the full vector type. Instead we can extract the smallest subregister that contains the subvector to be extracted and perform the vslidedown with a smaller LMUL. E.g, with +Zvl128b: v2i64 = extract_subvector nxv4i64, 2 is currently lowered as vsetivli zero, 2, e64, m4, ta, ma vslidedown.vi v8, v8, 2 This patch shrinks the vslidedown to LMUL=2: vsetivli zero, 2, e64, m2, ta, ma vslidedown.vi v8, v8, 2 Because we know that there's at least 128*2=256 bits in v8 at LMUL=2, and we only need the first 256 bits to extract a v2i64 at index 2. lowerEXTRACT_VECTOR_ELT already has this logic, so this extracts it out and reuses it. I've split this out into a separate PR rather than include it in #65392, with the hope that we'll be able to generalize it later. This patch refactors extract_subvector lowering to lower to extract_subreg directly, and to shortcut whenever the index is 0 when extracting a scalable vector. This doesn't change any of the existing behaviour, but makes an upcoming patch that extends the scalable path slightly easier to read.
…#65598) As noted in llvm#65392 (comment), when lowering an extract of a fixed length vector from another vector, we don't need to perform the vslidedown on the full vector type. Instead we can extract the smallest subregister that contains the subvector to be extracted and perform the vslidedown with a smaller LMUL. E.g, with +Zvl128b: v2i64 = extract_subvector nxv4i64, 2 is currently lowered as vsetivli zero, 2, e64, m4, ta, ma vslidedown.vi v8, v8, 2 This patch shrinks the vslidedown to LMUL=2: vsetivli zero, 2, e64, m2, ta, ma vslidedown.vi v8, v8, 2 Because we know that there's at least 128*2=256 bits in v8 at LMUL=2, and we only need the first 256 bits to extract a v2i64 at index 2. lowerEXTRACT_VECTOR_ELT already has this logic, so this extracts it out and reuses it. I've split this out into a separate PR rather than include it in llvm#65392, with the hope that we'll be able to generalize it later. This patch refactors extract_subvector lowering to lower to extract_subreg directly, and to shortcut whenever the index is 0 when extracting a scalable vector. This doesn't change any of the existing behaviour, but makes an upcoming patch that extends the scalable path slightly easier to read.
…T_SUBVECTOR This is a revival of llvm#65392. When we lower an extract_subvector, we extract the subregister that the subvector is contained in first and then do a vslidedown with LMUL=1. We can currently only do this for scalable vectors though because the index is scaled by vscale and thus we will know what subregister the subvector lies in. For fixed length vectors, the index isn't scaled by vscale and so the subvector could lie in any arbitrary subregister, so we have to do a vslidedown with the full LMUL. The exception to this is when we know the exact VLEN: in which case, we can still work out the exact subregister and do the LMUL=1 vslidedown on it. This patch handles this case by scaling the index by 1/vscale before computing the subregister, and extending the LMUL=1 path to handle fixed length vectors.
…T_SUBVECTOR (#79949) This is a revival of #65392. When we lower an extract_subvector, we extract the subregister that the subvector is contained in first and then do a vslidedown with LMUL=1. We can currently only do this for scalable vectors though because the index is scaled by vscale and thus we will know what subregister the subvector lies in. For fixed length vectors, the index isn't scaled by vscale and so the subvector could lie in any arbitrary subregister, so we have to do a vslidedown with the full LMUL. The exception to this is when we know the exact VLEN: in which case, we can still work out the exact subregister and do the LMUL=1 vslidedown on it. This patch handles this case by scaling the index by 1/vscale before computing the subregister, and extending the LMUL=1 path to handle fixed length vectors.
If we know VLEN at compile time, then we can workout what subregister an index into a fixed length vector will be at.
We can use this information when lowering extract_subvector to perform the vslidedown on a smaller subregister. This allows us to use a smaller LMUL, or if the extract is aligned to a vector register then we can avoid the slide
altogether.
The logic here is a bit tangled with the scalable path: If people find this too unwieldy, I can separate it out and duplicate it for the fixed case.
This technique could be applied to extract_vector_elt, insert_vector_elt and insert_subvector too.
This is stacked upon #65391