8356176: C2 MemorySegment: missing RCE with byteSize() in Loop Exit Check inside the for Expression #26429

mhaessig · 2025-07-22T15:05:29Z

A loop of the form

MemorySegment ms = {};
for (long i = 0; i < ms.byteSize() / 8L; i++) {
    // vectorizable work
}

does not vectorize, whereas

MemorySegment ms = {};
long size = ms.byteSize();
for (long i = 0; i < size / 8L; i++) {
    // vectorizable work
}

vectorizes. The reason is that the loop with the loop limit lifted manually out of the loop exit check is immediately detected as a counted loop, whereas the other (more intuitive) loop has to be cleaned up a bit, before it is recognized as counted. Tragically, the LShift used in the loop exit check gets split through the phi preventing range check elimination, which is why the loop does not get vectorized. Before splitting through the phi, there is a check to prevent splitting LShifts modifying the IV of a counted loop:

jdk/src/hotspot/share/opto/loopopts.cpp

Lines 1172 to 1176 in e3f85c9

    
           // Pushing a shift through the iv Phi can get in the way of addressing optimizations or range check elimination 
        
           if (n_blk->is_BaseCountedLoop() && n->Opcode() == Op_LShift(n_blk->as_BaseCountedLoop()->bt()) && 
        
               n->in(1) == n_blk->as_BaseCountedLoop()->phi()) { 
        
             return n; 
        
           }

Hence, not detecting the counted loop earlier is the main culprit for the missing vectorization.

So, why is the counted loop not detected? Because the call to byteSize() is inside the loop head, and CiTypeFlow::clone_loop_heads() duplicates it into the loop body. The loop limit in the cloned loop head is loop variant and thus cannot be detected as a counted loop. The first ITER_GVN in PHASEIDEALLOOP1 will already remove the cloned loop head, enabling counted loop detection in the following iteration, which in turn enables vectorization.

Possible solutions

Prevent splitting LShifts in uncounted loops that have the same shape as a counted loop would have. This solution is implemented with UseNewCode as guard.

Problems

This is just another one of those exceptions from splitting through phis, and one of the more imprecise at that. So, this will prevent some `LShift`s in uncounted loops being split even though it would be beneficial.

As @merykitty mentions below, overly eager splitting through phis leads to similar problems, e.g. in JDK-8348096. To alleviate we track where we get wins when splitting. If a split onis only profitable in te loop entry, but not on the backedge, we deem it as not profitable.

Alternatives

Insert a "`PHASEIDEALLOOP0`" with `LoopOptsNone` that only performs loop tree building and then a round of IGVN where `Loop` nodes have been created. This cleans up the duplicated loop limit field access inside of the loop, which enables the counted loop detection in `PHASEIDEALLOOP1`.

Problems

In spirit, this is only a cleanup before the "real" work on loops begins. However, two tests fail with this solution. TestMultiversionRemoveUselessSlowLoop.java fails, because the small loop does not get multiversioned, effectively obviating the test. More critically, TestCombineAddPWithConstantOffsets.java (added in #21898) detects a regression with folding address computations. The problem, as I understand it, is that PHASEIDEALLOOP0 enables cleaning of a long data path that seems to be essential for folding AddP nodes with constant offsets.

I am unsure what the best solution really is the best and would appreciate feedback on that. Personally, I prefer the second option, since the logic of a cleanup pass before the "real" loop opts begin makes sense to me.

Progress

Change must be properly reviewed (1 review required, with at least 1 Reviewer)
Change must not contain extraneous whitespace
Commit message must refer to an issue

Issue

JDK-8356176: C2 MemorySegment: missing RCE with byteSize() in Loop Exit Check inside the for Expression (Sub-task - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/26429/head:pull/26429
$ git checkout pull/26429

Update a local copy of the PR:
$ git checkout pull/26429
$ git pull https://git.openjdk.org/jdk.git pull/26429/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 26429

View PR using the GUI difftool:
$ git pr show -t 26429

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/26429.diff

bridgekeeper · 2025-07-22T15:06:31Z

👋 Welcome back mhaessig! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2025-07-22T15:07:51Z

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

openjdk · 2025-07-22T15:08:40Z

@mhaessig The following label will be automatically applied to this pull request:

hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

merykitty · 2025-07-23T08:04:36Z

I think one possible solution is to avoid splitting through Phi if there is no benefit. In this case, the only benefit is in the loop entry, while there is none in the loop backedge. If we take frequency into consideration, we may be able to determine that the splitting is not profitable. What do you think?

mhaessig · 2025-07-23T09:11:46Z

I would not rely solely on profile information to solve this, but it might be a good additional piece of information for the first proposed solution.

merykitty · 2025-07-23T09:19:33Z

@mhaessig You don't really need profile information, only that the profit is on the loop entry path and there is no profit on the loop backedge.

mhaessig · 2025-07-23T12:19:15Z

I see, but, if I understand correctly, any logic that relates to profit will have to go into split_through_phi(). Since we already have special logic for not splitting LShifts in split_if_with_blocks_pre(), I would prefer to only have it there.

merykitty · 2025-07-23T12:34:37Z

From the principle point of view, splitting a node through the loop Phi is only profitable if the profit is in the loop backedge. From the practical point of view, there are some issues when split_through_phi is applied recklessly such as JDK-8348096. I believe taking loop head into consideration when splitting through Phis can solve these issues. As a result, I think while you are at this issue, it is worth investigating this approach.

mhaessig · 2025-07-23T14:44:28Z

@merykitty, I took me a while to understand, but now I implemented your suggestion and it works at least the case of this issue (testing is ongoing). Thank you for pushing back.

EDIT: It also fixes JDK-8348096.

merykitty

Thanks a lot for your work, I have some suggestions.

src/hotspot/share/opto/loopopts.cpp

src/hotspot/share/opto/loopnode.hpp

src/hotspot/share/opto/loopopts.cpp

openjdk bot added the hotspot-compiler [email protected] label Jul 22, 2025

mhaessig force-pushed the jdk-8356176-byte-size branch from 37cc12d to 45f5d7b Compare July 22, 2025 15:09

mhaessig added 2 commits July 23, 2025 11:18

Also don't split LShift through Phis for uncounted loops

eaf191f

Make {find,has}_out_with const

b8d3479

mhaessig added 2 commits July 23, 2025 13:50

Also match to range check if not in couted loop

c3b0bcd

Add regression test

728c64a

mhaessig added 3 commits July 23, 2025 15:09

Testing again :(

c50cb1c

Make split_through_phi policy uint

6be1bce

If splittig through phi is only profitable in loop entry, don't do it

5400c90

mhaessig force-pushed the jdk-8356176-byte-size branch from 45f5d7b to 5400c90 Compare July 23, 2025 14:41

merykitty reviewed Jul 23, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

8356176: C2 MemorySegment: missing RCE with byteSize() in Loop Exit Check inside the for Expression #26429

8356176: C2 MemorySegment: missing RCE with byteSize() in Loop Exit Check inside the for Expression #26429

mhaessig commented Jul 22, 2025 •

edited by openjdk bot

Loading

Uh oh!

bridgekeeper bot commented Jul 22, 2025

Uh oh!

openjdk bot commented Jul 22, 2025

Uh oh!

openjdk bot commented Jul 22, 2025

Uh oh!

merykitty commented Jul 23, 2025

Uh oh!

mhaessig commented Jul 23, 2025

Uh oh!

merykitty commented Jul 23, 2025

Uh oh!

mhaessig commented Jul 23, 2025

Uh oh!

merykitty commented Jul 23, 2025 •

edited

Loading

Uh oh!

mhaessig commented Jul 23, 2025 •

edited

Loading

Uh oh!

merykitty left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

	// Pushing a shift through the iv Phi can get in the way of addressing optimizations or range check elimination
	if (n_blk->is_BaseCountedLoop() && n->Opcode() == Op_LShift(n_blk->as_BaseCountedLoop()->bt()) &&
	n->in(1) == n_blk->as_BaseCountedLoop()->phi()) {
	return n;
	}

8356176: C2 MemorySegment: missing RCE with byteSize() in Loop Exit Check inside the for Expression #26429

Are you sure you want to change the base?

8356176: C2 MemorySegment: missing RCE with byteSize() in Loop Exit Check inside the for Expression #26429

Conversation

mhaessig commented Jul 22, 2025 • edited by openjdk bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Possible solutions

Problems

Progress

Issue

Reviewing

Uh oh!

bridgekeeper bot commented Jul 22, 2025

Uh oh!

openjdk bot commented Jul 22, 2025

Uh oh!

openjdk bot commented Jul 22, 2025

Uh oh!

merykitty commented Jul 23, 2025

Uh oh!

mhaessig commented Jul 23, 2025

Uh oh!

merykitty commented Jul 23, 2025

Uh oh!

mhaessig commented Jul 23, 2025

Uh oh!

merykitty commented Jul 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mhaessig commented Jul 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

merykitty left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mhaessig commented Jul 22, 2025 •

edited by openjdk bot

Loading

merykitty commented Jul 23, 2025 •

edited

Loading

mhaessig commented Jul 23, 2025 •

edited

Loading