Skip to content

[STJ] Optimize EnumFieldInfo sorting #117839

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jul 23, 2025

Conversation

IDisposable
Copy link
Contributor

@IDisposable IDisposable commented Jul 18, 2025

In PR #117730, it now computes the PopCount of each EnumFieldInfo .Key on every comparison during the sort. This means we execute the POPCOUNT instruction (or the emulation if NET is not defined) 2 x O(N log N) (average case) to 2 x O(N^2) (worst case).

We know that the popcount for an long integer can range from 0-64 and the array index is an int, so we can pack them both into a long place them both in a Tuple<int, int> with Item1 being the negated popcount and then use the Array.Sort(indices) to sort them super quick, and then extract use the original index in Item2.

The original spec was to sort the "more on bits" elements to the front of the resulting array, with a tie-breaker of the original array-index to ensure this is a stable sort with regard to the input array.

To get the more-on-bits to the front, when we pack into the long, we negate the popcount, so those with more on-bits will sort lower naturally.

Once sorted, we can just grab the low 32 bits Item2 to get the original index value.

@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Jul 18, 2025
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-text-json, @gregsdennis
See info in area-owners.md if you want to be subscribed.

@IDisposable IDisposable force-pushed the optimize/enum-topo-sort branch from 65c3a5b to b131460 Compare July 18, 2025 22:33
@IDisposable
Copy link
Contributor Author

@eiriktsarpalis might want to review this before the backport of #117730 to 9.x

Packs the calculated (once per entry) PopCount into the high 32 bits of the long and the original index into the low 32 bits. Then that can just be sorted using the heavily optimized Array.Sort() method.
After sorting just extract the low 32 bits as the original array index. As before, we negate the actual _PopCount_ to ensure that `Key`s with more on-bits (e.g. more flags represented) will sort **first**.

This trades 2 x O(N log N) [average case] to 2 x O(N^2) [worst case] calls to the `popcount` instruction (or the emulation if NET is not defined) for N **shift-left-32** and **or** + N x **truncate to 32 bits**. It _also_ eliminates the overhead of the `CompareTo` method as it's now a direct `long` low-level compare.
@IDisposable IDisposable force-pushed the optimize/enum-topo-sort branch 2 times, most recently from 6941ba7 to 0a3b247 Compare July 18, 2025 22:49
@IDisposable IDisposable marked this pull request as ready for review July 18, 2025 23:08
@Copilot Copilot AI review requested due to automatic review settings July 18, 2025 23:08
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR optimizes the sorting performance in EnumFieldInfo by eliminating redundant PopCount calculations during comparison operations. The change improves the time complexity from O(N log N) to O(N²) PopCount calculations down to just O(N) by pre-computing and packing the values.

Key changes:

  • Pre-computes PopCount values once per element instead of on every comparison
  • Packs PopCount and index into a single long for efficient sorting
  • Uses built-in Array.Sort instead of custom comparison delegate

@IDisposable
Copy link
Contributor Author

I could submit a PR that doesn't mush things into a long (uses a readonly struct instead) if that would be preferred, this just seemed the best approach.

@eiriktsarpalis
Copy link
Member

This means we execute the POPCOUNT instruction (or the emulation if NET is not defined) 2 x O(N log N) (average case) to 2 x O(N^2) (worst case).

How is it $O(n^2)$ exactly?

I could submit a PR that doesn't mush things into a long (uses a readonly struct instead) if that would be preferred

You could just precompute the tuple that the previous approach was already using. It's a struct and has the appropriate comparison semantics. I don't think it would make a ton of difference though, even if emulated popcount is still $O(1)$ making the overall complexity still $O(n \log n)$. Also, the $n$ is bounded by the number of labels in the enum and the sorting operation is only executing once at converter initialization.

@IDisposable
Copy link
Contributor Author

How is it O ( n 2 ) exactly?

It executes the PopCount on both the left and right comparands for each comparison. The sort (worst case) will do O(n^2) of those so 2 pop counts for each comparison...

You could just precompute the tuple that the previous approach was already using. It's a struct and has the appropriate comparison semantics.

I initially coded it as storing in the struct, this just seems faster since the comparison will be one long compare vs. two int field compares... but if the packing is too fancy, happy to submit the other way.

I don't think it would make a ton of difference though, even if emulated popcount is still O ( 1 ) making the overall complexity still O ( n log ⁡ n ) .

I guess that's where I'm on a different view... it just seems odd to compute the popcount every time we need to compare two elements.

Also, the n is bounded by the number of labels in the enum and the sorting operation is only executing once at converter initialization.

Indeed, it just seemed an obvious win.

In any case, no worries if you don't merge the PR (or the alternate one). Just something that stuck out in my regular code-reviews of the latest changes in runtime.

@PranavSenthilnathan
Copy link
Member

How is it O ( n 2 ) exactly?

It executes the PopCount on both the left and right comparands for each comparison. The sort (worst case) will do O(n^2) of those so 2 pop counts for each comparison...

Array.Sort is worst case O(n log n) (it uses a modified quicksort that falls back to heapsort when the recursion depth passes a threshold).

You could just precompute the tuple that the previous approach was already using. It's a struct and has the appropriate comparison semantics.

I initially coded it as storing in the struct, this just seems faster since the comparison will be one long compare vs. two int field compares... but if the packing is too fancy, happy to submit the other way.

The (int, int) version is more readable, and we already use the type, so we're not adding to AOT size by adding a reference to it:

public static (int, int) CountNewLines(ReadOnlySpan<byte> data)

Your proposed change also avoids a closure, so even if the perf benefit isn't going to be huge, it does seem slightly cleaner.

@eiriktsarpalis
Copy link
Member

Happy to accept the change if you change it to use a tuple instead.

Switched to Tuple
@IDisposable
Copy link
Contributor Author

Switched to Tuple (or did you want a simple struct?)

Switched to native tuple syntax
Co-authored-by: Pranav Senthilnathan <[email protected]>
@IDisposable IDisposable force-pushed the optimize/enum-topo-sort branch from bc5f21d to a02ebb9 Compare July 22, 2025 19:18
@eiriktsarpalis
Copy link
Member

/ba-g test failures unrelated.

@eiriktsarpalis eiriktsarpalis merged commit cd4c7cf into dotnet:main Jul 23, 2025
81 of 87 checks passed
eiriktsarpalis added a commit that referenced this pull request Jul 23, 2025
* [STJ] Only compute PopCount once when topologically sorting Enums

Packs the calculated (once per entry) PopCount into the high 32 bits of the long and the original index into the low 32 bits. Then that can just be sorted using the heavily optimized Array.Sort() method.
After sorting just extract the low 32 bits as the original array index. As before, we negate the actual _PopCount_ to ensure that `Key`s with more on-bits (e.g. more flags represented) will sort **first**.

This trades 2 x O(N log N) [average case] to 2 x O(N^2) [worst case] calls to the `popcount` instruction (or the emulation if NET is not defined) for N **shift-left-32** and **or** + N x **truncate to 32 bits**. It _also_ eliminates the overhead of the `CompareTo` method as it's now a direct `long` low-level compare.

* PR Feedback

Switched to Tuple

* PR Feedback
Switched to native tuple syntax
Co-authored-by: Pranav Senthilnathan <[email protected]>

* Update src/libraries/System.Text.Json/src/System/Text/Json/Serialization/Converters/Value/EnumConverter.cs

---------

Co-authored-by: Eirik Tsarpalis <[email protected]>
@IDisposable IDisposable deleted the optimize/enum-topo-sort branch July 23, 2025 17:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-System.Text.Json community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants