-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[RISCV] Refactor extract_subvector lowering slightly. NFC #65391
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
// Else we must shift our vector register directly to extract the subvector. | ||
// Do this using VSLIDEDOWN. | ||
// Else SubVecVT is a fractional LMUL and needs to be slid down. | ||
assert(RISCVVType::decodeVLMUL(getLMUL(SubVecVT)).second); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inside decomposeSubvectorInsertExtractToSubRegs, there's a comment which reads:
// Note that this is not guaranteed to find a subregister index, such as
// when we are extracting from one VR type to another.
This seems to contradict your new assert here.
After thinking about it, I think that comment is stale because moving the index=0 case early adds a precondition to that routine that SubVecVT != VecVT, and thus sizeof(SubVecVT) < sizeof(VecVT).
If you agree, would you mind updating the comment to reflect that? We've only changed the invariant for one of two callers, so we can't actually add the assert in the callee, but maybe right before the call for this caller? And maybe add an assert that the result is not NoRegister?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The index=0 case doesn't necessarily guarantee that sizeof(SubVecVT) < sizeof(VecVT), consider an extract of v2i8 from an nxv2i8 at index 2. It's a well formed extract_subvector, but at vscale=1, sizeof(v2i8) == sizeof(nxv2i8)
I also tried the NoRegister assert, but as it turns out if both SubVecVT and VecVT are LMUL=1 or less, e.g v2i8 and nxv2i8, they'll both have the LMUL=1 register class and decomposeSubvectorInsertExtractToSubRegs
will return NoRegister.
But if we get NoRegister then we don't end up performing the subregister extract anyway, because we only do it when VecVT.bitsGT(getLMUL1VT(VecVT))
. I think we can put the assert in that branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only case I can think of decomposing would fail and we try to extract a subregister is when SubVecVT and VecVT have the same LMUL and the LMUL>1.
In that case though, the only valid index is 0:
IDX must be a constant multiple of T's known minimum vector length. If T is a scalable vector, DX is first scaled by the runtime scaling factor of T. Elements IDX through (IDX + num_elements(T) - 1) must be valid VECTOR indices.
So we should have returned early by the precondition.
032a5a6
to
efffa83
Compare
MVT InterSubVT = VecVT; | ||
if (VecVT.bitsGT(getLMUL1VT(VecVT))) { | ||
// If VecVT has an LMUL > 1, then SubVecVT should have a smaller LMUL, and | ||
// we should have successfully decomposed the extract into a subregister. | ||
assert(SubRegIdx != RISCV::NoRegister); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the new bit I added in the force push. Note to self, don't force push after rebasing since all the rebased changes end up in the comparison.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be NoSubRegister
not NoRegister
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This patch refactors extract_subvector to lower to extract_subreg directly, and to shortcut whenever the index is 0 when extracting a scalable vector. This doesn't change any of the existing behaviour, but makes an upcoming patch that extends the scalable path slightly easier to read.
efffa83
to
3463e51
Compare
This patch refactors extract_subvector lowering to lower to extract_subreg directly, and to shortcut whenever the index is 0 when extracting a scalable vector. This doesn't change any of the existing behaviour, but makes an upcoming patch that extends the scalable path slightly easier to read.
This patch refactors extract_subvector lowering to lower to extract_subreg directly, and to shortcut whenever the index is 0 when extracting a scalable vector. This doesn't change any of the existing behaviour, but makes an upcoming patch that extends the scalable path slightly easier to read.
This is stacked upon #65389