-
Notifications
You must be signed in to change notification settings - Fork 5.1k
[STJ] Optimize EnumFieldInfo sorting #117839
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[STJ] Optimize EnumFieldInfo sorting #117839
Conversation
Tagging subscribers to this area: @dotnet/area-system-text-json, @gregsdennis |
65c3a5b
to
b131460
Compare
@eiriktsarpalis might want to review this before the backport of #117730 to 9.x |
Packs the calculated (once per entry) PopCount into the high 32 bits of the long and the original index into the low 32 bits. Then that can just be sorted using the heavily optimized Array.Sort() method. After sorting just extract the low 32 bits as the original array index. As before, we negate the actual _PopCount_ to ensure that `Key`s with more on-bits (e.g. more flags represented) will sort **first**. This trades 2 x O(N log N) [average case] to 2 x O(N^2) [worst case] calls to the `popcount` instruction (or the emulation if NET is not defined) for N **shift-left-32** and **or** + N x **truncate to 32 bits**. It _also_ eliminates the overhead of the `CompareTo` method as it's now a direct `long` low-level compare.
6941ba7
to
0a3b247
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR optimizes the sorting performance in EnumFieldInfo
by eliminating redundant PopCount calculations during comparison operations. The change improves the time complexity from O(N log N) to O(N²) PopCount calculations down to just O(N) by pre-computing and packing the values.
Key changes:
- Pre-computes PopCount values once per element instead of on every comparison
- Packs PopCount and index into a single long for efficient sorting
- Uses built-in Array.Sort instead of custom comparison delegate
...raries/System.Text.Json/src/System/Text/Json/Serialization/Converters/Value/EnumConverter.cs
Outdated
Show resolved
Hide resolved
I could submit a PR that doesn't mush things into a long (uses a readonly struct instead) if that would be preferred, this just seemed the best approach. |
How is it
You could just precompute the tuple that the previous approach was already using. It's a struct and has the appropriate comparison semantics. I don't think it would make a ton of difference though, even if emulated popcount is still |
It executes the PopCount on both the left and right comparands for each comparison. The sort (worst case) will do O(n^2) of those so 2 pop counts for each comparison...
I initially coded it as storing in the struct, this just seems faster since the comparison will be one
I guess that's where I'm on a different view... it just seems odd to compute the popcount every time we need to compare two elements.
Indeed, it just seemed an obvious win. In any case, no worries if you don't merge the PR (or the alternate one). Just something that stuck out in my regular code-reviews of the latest changes in runtime. |
The runtime/src/libraries/System.Text.Json/src/System/Text/Json/Reader/JsonReaderHelper.cs Line 25 in 4ea1713
Your proposed change also avoids a closure, so even if the perf benefit isn't going to be huge, it does seem slightly cleaner. |
Happy to accept the change if you change it to use a tuple instead. |
Switched to Tuple
Switched to Tuple (or did you want a simple struct?) |
...raries/System.Text.Json/src/System/Text/Json/Serialization/Converters/Value/EnumConverter.cs
Outdated
Show resolved
Hide resolved
...raries/System.Text.Json/src/System/Text/Json/Serialization/Converters/Value/EnumConverter.cs
Outdated
Show resolved
Hide resolved
Switched to native tuple syntax Co-authored-by: Pranav Senthilnathan <[email protected]>
bc5f21d
to
a02ebb9
Compare
...raries/System.Text.Json/src/System/Text/Json/Serialization/Converters/Value/EnumConverter.cs
Outdated
Show resolved
Hide resolved
…ion/Converters/Value/EnumConverter.cs
/ba-g test failures unrelated. |
* [STJ] Only compute PopCount once when topologically sorting Enums Packs the calculated (once per entry) PopCount into the high 32 bits of the long and the original index into the low 32 bits. Then that can just be sorted using the heavily optimized Array.Sort() method. After sorting just extract the low 32 bits as the original array index. As before, we negate the actual _PopCount_ to ensure that `Key`s with more on-bits (e.g. more flags represented) will sort **first**. This trades 2 x O(N log N) [average case] to 2 x O(N^2) [worst case] calls to the `popcount` instruction (or the emulation if NET is not defined) for N **shift-left-32** and **or** + N x **truncate to 32 bits**. It _also_ eliminates the overhead of the `CompareTo` method as it's now a direct `long` low-level compare. * PR Feedback Switched to Tuple * PR Feedback Switched to native tuple syntax Co-authored-by: Pranav Senthilnathan <[email protected]> * Update src/libraries/System.Text.Json/src/System/Text/Json/Serialization/Converters/Value/EnumConverter.cs --------- Co-authored-by: Eirik Tsarpalis <[email protected]>
In PR #117730, it now computes the
PopCount
of eachEnumFieldInfo
.Key
on every comparison during the sort. This means we execute the POPCOUNT instruction (or the emulation if NET is not defined)2 x O(N log N)
(average case) to2 x O(N^2)
(worst case).We know that the popcount for an long integer can range from 0-64 and the array
index
is anint
, so we canpack them both into aplace them both in along
Tuple<int, int>
withItem1
being the negated popcount and then use theArray.Sort(indices)
to sort them super quick, and thenextractuse the originalindex
inItem2
.The original spec was to sort the "more on bits" elements to the front of the resulting array, with a tie-breaker of the original array-index to ensure this is a stable sort with regard to the input array.
To get the more-on-bits to the front,
when we pack into the long,we negate the popcount, so those with more on-bits will sort lower naturally.Once sorted, we can just grab the
low 32 bitsItem2
to get the original index value.