Skip to content

Check Avx512BW.IsSupported in BitArray.CopyTo() #114818

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 28, 2025

Conversation

tfenise
Copy link
Contributor

@tfenise tfenise commented Apr 18, 2025

Vector512<byte> shuffled = Avx512BW.Shuffle(scalar.AsByte(), shuffleMask);

Vector512<byte> normalized = Avx512BW.Min(extracted, ones);

It's using Avx512BW, so it should check Avx512BW.IsSupported first. Checking Avx512F.IsSupported is not enough, as there are (old) CPUs supporting Avx512F but not Avx512BW according to https://en.wikipedia.org/w/index.php?title=AVX-512&oldid=1281313848#CPUs_with_AVX-512.

@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Apr 18, 2025
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-collections
See info in area-owners.md if you want to be subscribed.

@@ -837,7 +837,7 @@ public unsafe void CopyTo(Array array, int index)
Vector128<byte> lowerShuffleMask_CopyToBoolArray = Vector128.Create(0, 0x01010101_01010101).AsByte();
Vector128<byte> upperShuffleMask_CopyToBoolArray = Vector128.Create(0x02020202_02020202, 0x03030303_03030303).AsByte();

if (Avx512F.IsSupported && (uint)m_length >= Vector512<byte>.Count)
if (Avx512BW.IsSupported && (uint)m_length >= Vector512<byte>.Count)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you're touching this anyways, would you want to update this to use the xplat APIs instead?

That is, swap these out:

  • Avx512BW.IsSupported -> Vector512.IsHardwareAccelerated
  • Avx512BW.Shuffle(x, y) -> Vector512.Shuffle(x, y)
  • Avx512F.And(x, y) -> x & y
  • Avx512BW.Min(x, y) -> Vector512.Min(x, y)
  • Avx512F.Store(x, y) -> y.Store(x)

Similar changes could also be made to the Avx2 path (using Vector256) and so on.


This change to Avx512BW is notably "more correct" than Avx512F, but notably it's not technically broken today since we require AVX512F+BW+CD+DQ+VL to be provided for any AVX512 support to exist. It's still goodness to fix, but moving away from the platform specific APIs altogether would be even better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About Vector512.Shuffle(x, y)

// Same logic as SSSE3 path, except we do not have Shuffle instruction.
// (TableVectorLookup could be an alternative - dotnet/runtime#1277)
// Instead we use chained ZIP1/2 instructions:

It seems that Arm64 might not have the proper instruction to support VectorXXX.Shuffle<byte>(x, y). Could it happen that VectorXXX.IsHardwareAccelerated is true but VectorXXX.Shuffle<byte>(x, y) has poor performance on some platforms?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that Arm64 might not have the proper instruction to support VectorXXX.Shuffle(x, y). C

That comment is very outdated, VectorXXX.Shuffle covers the usage and does the correct thing.

Could it happen that VectorXXX.IsHardwareAccelerated is true but VectorXXX.Shuffle(x, y) has poor performance on some platforms?

Not for the way we're using it here. Theoretically it could happen if you were on a machine from 2005 and didn't have SSSE3 but hit the Vector128.IsHardwareAccelerated code path. This isn't really a concern for anyone using the current versions of .NET and could be adjusted later if it was a concern.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Today, Vector256.IsHardwareAccelerated strictly implies AVX2 support, and Vector512.IsHardwareAccelerated strictly implies AVX-512F+CD+DQ+BW+VL. Arm64 support for larger vectors comes in the form of Scalable Vector Extensions (SVE), which does not allow acceleration of the fixed-width vector types.

The Vector128 code in BitArray is outdated and could also use the xplat intrinsics.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, after the runtime has proper AVX512 masking support #87097, the AVX512 path should probably use kmovq k, m64 and use that mask. This gives a reason to promote #87097. Will xplat Vector512 offer the same capacity?

Copy link
Member

@saucecontrol saucecontrol Apr 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shuffle attempts to normalize behavior across platforms, including zeroing out of range indices and allowing full vector (cross-lane) shuffle/permute. If you don't need that behavior, you can use the new Vector512.ShuffleNative, which will emit the expected vpshufb.

vpermb is actually AVX512_VBMI, not just AVX-512F+CD+DQ+BW+VL

This is not a problem. The baseline ISA set is required for any acceleration of Vector512, but JIT can opportunistically use any ISA supported by the hardware. One of the benefits of the xplat vector APIs is that JIT can optimize or polyfill using whatever instructions are available.

This comment was marked as resolved.

Copy link
Member

@stephentoub stephentoub May 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you're touching this anyways, would you want to update this to use the xplat APIs instead?

@tannergooding , should this PR be merged or closed in the interim?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets take it, as it is technically correct and will ensure valid behavior in the case other runtimes use the same libraries and have their own distinct support. (There's no bug on RyuJIT today due to how we've defined things to work for ourselves)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logged #116079 to track the move to xplat APIs

@tannergooding
Copy link
Member

/ba-g unrelated build failures, simply updated an Isa.IsSupported check to be more correct

@tannergooding tannergooding merged commit cb502bd into dotnet:main May 28, 2025
81 of 85 checks passed
@tfenise tfenise deleted the patch-1 branch May 29, 2025 15:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-System.Collections community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants