Skip to content

Conversation

@1RyanK
Copy link
Collaborator

@1RyanK 1RyanK commented Nov 28, 2025

Previously it seemed like the way to divide bytes up appropriately in a SegString was to do something like this:

ref offsets = strOffsetInLocale[here.id].Arr;
offsets[0..#size] = strings.offsets.a.localSlice[offsetsDom];
const end = if offsetsDom.high >= strings.offsets.a.domain.high
            then strings.values.a.size
            else globalOffsets[offsetsDom.high + 1];
const start = offsets[0];
strBytesInLocale[here.id] = new innerArray({0..#(end - start)}, uint(8));
strBytesInLocale[here.id].Arr[0..#(end - start)] = strings.values.a[start..<end];

As you can see, each locale is just grabbing the strings corresponding to the offsets that are local. This is rather easy to do, but maybe not the best way to do things.

This PR makes it easy to do things a little bit differently - try to grab mostly the bytes that are on the current locale. It also turns everything into an innerArray of strings. You can see how simple this makes the concatenateUniqueStrMsg2 function. Also added segStringFromInnerArray to convert back into a SegString easily.

The way that it figures out which strings end up on which locales is as follows: if a string is entirely on one locale, then it belongs to that locale. If it overlaps two or more locales, then the locale that ends up owning it is the locale with the most bytes. If there is a tie, then of the locales with the most bytes of that string, the one with the lowest ID gets it.

Closes #5089: Add a balancedSegStringRetrieval function

@1RyanK 1RyanK force-pushed the 5089-Add_a_balancedSegStringRetrieval_function branch 2 times, most recently from 72202bb to 9f29012 Compare November 28, 2025 15:43
@1RyanK
Copy link
Collaborator Author

1RyanK commented Nov 28, 2025

While this is not passing the CI, it does seem to be a little hungry on the RAM, and I did hit an instantiation limit at one point (on my laptop doing multi dim). I don't think it actually fails.

@1RyanK 1RyanK force-pushed the 5089-Add_a_balancedSegStringRetrieval_function branch 2 times, most recently from a74e4b0 to 68569d8 Compare December 3, 2025 13:04
Copy link
Contributor

@e-kayrakli e-kayrakli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, with the caveat that the logic was a bit hard to follow. That's not to say that you should change something in the logic, but just to say that I don't think I can catch logical issues and as long as testing is clean, it should be good enough.

There are some inline comments that are mostly stylistic and performance related. Otherwise, looks good!

return Strings.from_return_msg(cast(str, rep_msg))

@staticmethod
def concatenate_uniquely2(strings: List[Strings]) -> Strings:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you find a more self-documenting name rather than the 2 suffix?

}
registerFunction("concatenateUniquely", concatenateUniqueStrMsg, getModuleName());

proc concatenateUniqueStrMsg2(cmd: string, msgArgs: borrowed MessageArgs, st: borrowed SymTab): MsgTuple throws {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto the name

// Reduce to unique strings within this locale
forall str in myStrings with (+ reduce locSet) {
locSet.add(str);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting pattern!

It looks like in some tests, at least, using set(string, parSafe=true) instead and not doing things through the + reduce intent results in faster performance.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Credit goes to Shreyas for his quick test after I asked the general question to the Chapel team.

var firstOffsetPerLoc: [0..#numLocales] int = -1;

// First pass: for each locale, record its local ranges in the global bytes and offsets.
forall loc in 0..#numLocales {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, this loop runs entirely on locale 0 (though in parallel). It looks like this could actually be done in parallel by locales simultaneously, if I am not mistaken. If that's the case, coforall loc in Locales.domain do on Locales[loc] { could be a better loop+on combo here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how much this matters, since the loop should execute pretty quickly either way, but which way would these accesses be local?

const currSubdomain = strBytes.localSubdomain(Locales[loc]);

I may be splitting hairs, but if the domain data all lives on locale 0, then doing a coforall here adds in some remote accesses, right?


// Second pass: for each locale, determine where the strings that *start* on this
// locale end in the global byte buffer.
forall loc in 0..#numLocales {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the data here lives on locale 0, and I could split up the arrays to have one value per locale, but it feels like that might incur some extra remote accesses. I don't know which way would be faster, though.


// Record the global string index for the first string on locale i
firstStringIndices[i] = idx;
firstStringStartOffsets[i] = myOffsets[idx];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these not race conditions?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The if that these are wrapped inside of makes sure that they are not race conditions. Basically each locale checks to see if it knows what these values should be set to, and only one locale will see that it does.


// Full size of the string in bytes
const size = endStr - myOffsets[idx];
firstStringSizes[i] = size;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another potential race condition.

// Portion of this string that resides on locale i's byte range
// (truncate end to not go past that locale’s last byte + 1)
const endStrThisLoc = min(endStr, myLastIdxPerLoc[i] + 1);
firstStringNumBytesEachLocale[i] = endStrThisLoc - myFirstIdxPerLoc[i];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And another one.

These could be benign write-after-write hazards, though I still wanted to check.

// Given a SegString (global offsets + global byte buffer), it figures out for each
// locale which *parts* of which strings intersect that locale's local chunk of bytes,
// and collects bookkeeping needed to rebuild those pieces into local strings.
proc balancedSegStringRetrieval(const ref strings: SegString):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if there is no reuse opportunity, would it be easy to chop this into multiple smaller functions for easier code maintenance?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will take another look, but I'm not sure because a lot of the variables get used throughout...

var strSizes: [0..#strArray.size] int;

// Compute size of each string in bytes, including the null terminator
forall (i, str) in zip(0..#strArray.size, strArray) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider forall (size, str) in zip(strSizes, strArray)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I need a value for i, so I changed it to a 3-long tuple

@1RyanK 1RyanK force-pushed the 5089-Add_a_balancedSegStringRetrieval_function branch 2 times, most recently from 8a1244d to b1d0ae2 Compare December 14, 2025 21:54
github-merge-queue bot pushed a commit that referenced this pull request Dec 17, 2025
…5160)

In order to test performance in PR #5090 I thought it would be good to
have a benchmark. This sets up the framework, and then in that PR I will
modify the benchmark to include the new function introduced there.

Closes #5159: Create a benchmark for uniquely concatenating strings
@1RyanK 1RyanK force-pushed the 5089-Add_a_balancedSegStringRetrieval_function branch from b1d0ae2 to 1bd0972 Compare December 18, 2025 16:52
@1RyanK 1RyanK force-pushed the 5089-Add_a_balancedSegStringRetrieval_function branch from 1df4839 to ea51fe2 Compare December 18, 2025 16:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add a balancedSegStringRetrieval function

2 participants