Skip to content

[RISCV] Add test for copy propagation issue with VMV0. NFC #75347

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

lukel97
Copy link
Contributor

@lukel97 lukel97 commented Dec 13, 2023

I am currently looking into selecting mask registers as virtual registers
instead of physical registers (i.e. copies into V0) in order to simplify #71764,
i.e. select masked pseudos by using the vmv0 reg class for the mask operand:

PseudoVADD_VV_M1_MASK %x, %y, %mask:vmv0

instead of how we currently copy it into v0:

$v0 = COPY %mask:vr
PseudoVADD_VV_M1_MASK %x, %y, $v0

One issue I've run into with this approach is that register allocation can fail
on vector compare instructions, due to an interaction with MachineCSE and how
we model register overlap constraints.

We currently model vector register overlap constraints with early clobber. For
instructions like PseudoVMSEQ_VV_M2_MASK, this is more restrictive than what is
needed since the mask operand can be the same as the destination register, but
there's currently no way in LLVM today to mark only a subset of operands as
being clobbered: [1]

early-clobber %res:vr = PseudoVMSEQ_VV_M2_MASK %pt:vr(tied-def 0), ..., %mask:vmv0, ...

The issue arises if passthru operand is a copy of mask operand, e.g:

%mask:vmv0 = ...
%pt:vr = COPY %mask

MachineCSE performs trivial copy propagation and will coalesce the copy of
%mask to the passthru operand:

early-clobber %res:vr = PseudoVMSEQ_VV_M2_MASK %mask:vmv0(tied-def 0), ..., %mask:vmv0, ...

The two address instruction pass then sees the tied operand and constrains the
def's register class:

%mask:vmv0 = ...
%res:vmv0 = COPY %mask
early-clobber %res:vmv0 = PseudoVMSNE_VV_M2_MASK %res:vmv0, ..., %mask:vmv0

Because of the early-clobber constraint, %mask and %res will need to be
separate registers: But vmv0 only has one register, and allocation errors out.

This doesn't occur today because we explicitly copy the mask into $v0 first, so
the coalescing never occurs in the first place.

I will post a separate PR with one possible approach to fixing (teaching
MachineCSE to avoid coalescing in this case)

[1] https://discourse.llvm.org/t/earlyclobber-but-for-a-subset-of-the-inputs/55240

I am currently looking into selecting mask registers as virtual registers
instead of physical registers (i.e. copies into V0) in order to simplify

PseudoVADD_VV_M1_MASK %x, %y, %mask:vmv0

instead of how we currently copy it into v0:

$v0 = COPY %mask:vr
PseudoVADD_VV_M1_MASK %x, %y, $v0

One issue I've run into with this approach is that register allocation can fail
on vector compare instructions, due to an interaction with MachineCSE and how
we model register overlap constraints.

We currently model vector register overlap constraints with early clobber. For
instructions like PseudoVMSEQ_VV_M2_MASK, this is more restrictive than what is
needed since the mask operand can be the same as the destination register, but
there's currently no way in LLVM today to mark only a subset of operands as
being clobbered: [1]

early-clobber %res:vr = PseudoVMSEQ_VV_M2_MASK %pt:vr(tied-def 0), ..., %mask:vmv0, ...

The issue arises if passthru operand is a copy of mask operand, e.g:

%mask:vmv0 = ...
%pt:vr = COPY %mask

MachineCSE performs trivial copy propagation and will coalesce the copy of
%mask to the passthru operand:

early-clobber %res:vr = PseudoVMSEQ_VV_M2_MASK %mask:vmv0(tied-def 0), ..., %mask:vmv0, ...

The two address instruction pass then sees the tied operand and constrains the
def's register class:

%mask:vmv0 = ...
%res:vmv0 = COPY %mask
early-clobber %res:vmv0 = PseudoVMSNE_VV_M2_MASK %res:vmv0, ..., %mask:vmv0

Because of the early-clobber constraint, %mask and %res will need to be
separate registers: But vmv0 only has one register, and allocation errors out.

This doesn't occur today because we explicitly copy the mask into $v0 first, so
the coalescing never occurs in the first place.

I will post a separate PR with one possible approach to fixing (teaching
MachineCSE to avoid coalescing in this case)

[1] https://discourse.llvm.org/t/earlyclobber-but-for-a-subset-of-the-inputs/55240
Copy link
Collaborator

@topperc topperc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lukel97
Copy link
Contributor Author

lukel97 commented Jan 2, 2024

After some discussion with others, it looks using singleton register classes (i.e. VMV0) opens up a whole can of worms, and RISC-V would likely be the only target to extensively do so. It's probably not worth tackling these issues, so closing for now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants