Skip to content

[API Proposal]: Vector128.ShuffleNative #81609

Closed
@MihaZupan

Description

@MihaZupan

Background and motivation

When implementing vectorized algorithms using the new Vector128 and friends APIs, Vector128.Shuffle is a case where one sometimes has to drop back down to using architecture-specific instructions.

Vector128.Shuffle guarantees that results for out-of-range indices (values above 15) are normalized to 0.
If the JIT can't prove that indices are already in the [0, 15] range, it is forced to emit extra logic to do the normalization.
For some algorithms, you may also want to have faster logic that uses other underlying underlying intrinsics native to the platform (e.g. X86's PSHUFB will treat indices MOD 16, and set the result to 0 if the high bit is set).

The solution in both cases is to use the architecture-specific instruction directly, often writing a helper such as

private static Vector128<byte> Shuffle(Vector128<byte> vector, Vector128<byte> indices)
{
    return Ssse3.IsSupported
        ? Ssse3.Shuffle(vector, indices)
        : AdvSimd.Arm64.VectorTableLookup(vector, indices);
}

(We already have 5 such helpers in runtime)

I propose adding a Vector128.ShuffleNative helper that does not guarantee identical behavior for out-of-range indices across platforms. Instead, it would be defined as using one of the other underlying intrinsic directly when available to produce faster code. Code that strictly needs a particular behavior for a platform would still need to use the platform specific intrinsics directly.

API Proposal

namespace System.Runtime.Intrinsics;

public static class Vector128
{
    public static Vector128<byte> ShuffleNative(Vector128<byte> vector, Vector128<byte> indices);
    public static Vector128<sbyte> ShuffleNative(Vector128<sbyte> vector, Vector128<sbyte> indices);
    public static Vector128<short> ShuffleNative(Vector128<short> vector, Vector128<short> indices);
    public static Vector128<ushort> ShuffleNative(Vector128<ushort> vector, Vector128<ushort> indices);
    public static Vector128<int> ShuffleNative(Vector128<int> vector, Vector128<int> indices);
    public static Vector128<uint> ShuffleNative(Vector128<uint> vector, Vector128<uint> indices);
    public static Vector128<long> ShuffleNative(Vector128<int> vector, Vector128<long> indices);
    public static Vector128<ulong> ShuffleNative(Vector128<uint> vector, Vector128<ulong> indices);
    public static Vector128<float> ShuffleNative(Vector128<float> vector, Vector128<int> indices);
    public static Vector128<double> ShuffleNative(Vector128<double> vector, Vector128<ulong> indices);
}

public static class Vector256
{
    public static Vector256<byte> ShuffleNative(Vector256<byte> vector, Vector256<byte> indices);
    public static Vector256<sbyte> ShuffleNative(Vector256<sbyte> vector, Vector256<sbyte> indices);
    public static Vector256<short> ShuffleNative(Vector256<short> vector, Vector256<short> indices);
    public static Vector256<ushort> ShuffleNative(Vector256<ushort> vector, Vector256<ushort> indices);
    public static Vector256<int> ShuffleNative(Vector256<int> vector, Vector256<int> indices);
    public static Vector256<uint> ShuffleNative(Vector256<uint> vector, Vector256<uint> indices);
    public static Vector256<long> ShuffleNative(Vector256<int> vector, Vector256<long> indices);
    public static Vector256<ulong> ShuffleNative(Vector256<uint> vector, Vector256<ulong> indices);
    public static Vector256<float> ShuffleNative(Vector256<float> vector, Vector256<int> indices);
    public static Vector256<double> ShuffleNative(Vector256<double> vector, Vector256<ulong> indices);
}

public static class Vector512
{
    public static Vector512<byte> ShuffleNative(Vector512<byte> vector, Vector512<byte> indices);
    public static Vector512<sbyte> ShuffleNative(Vector512<sbyte> vector, Vector512<sbyte> indices);
    public static Vector512<short> ShuffleNative(Vector512<short> vector, Vector512<short> indices);
    public static Vector512<ushort> ShuffleNative(Vector512<ushort> vector, Vector512<ushort> indices);
    public static Vector512<int> ShuffleNative(Vector512<int> vector, Vector512<int> indices);
    public static Vector512<uint> ShuffleNative(Vector512<uint> vector, Vector512<uint> indices);
    public static Vector512<long> ShuffleNative(Vector512<int> vector, Vector512<long> indices);
    public static Vector512<ulong> ShuffleNative(Vector512<uint> vector, Vector512<ulong> indices);
    public static Vector512<float> ShuffleNative(Vector512<float> vector, Vector512<int> indices);
    public static Vector512<double> ShuffleNative(Vector512<double> vector, Vector512<ulong> indices);
}

public static class Vector64
{
    public static Vector64<byte> ShuffleNative(Vector64<byte> vector, Vector64<byte> indices);
    public static Vector64<sbyte> ShuffleNative(Vector64<sbyte> vector, Vector64<sbyte> indices);
    public static Vector64<short> ShuffleNative(Vector64<short> vector, Vector64<short> indices);
    public static Vector64<ushort> ShuffleNative(Vector64<ushort> vector, Vector64<ushort> indices);
    public static Vector64<int> ShuffleNative(Vector64<int> vector, Vector64<int> indices);
    public static Vector64<uint> ShuffleNative(Vector64<uint> vector, Vector64<uint> indices);
    public static Vector64<long> ShuffleNative(Vector64<int> vector, Vector64<long> indices);
    public static Vector64<ulong> ShuffleNative(Vector64<uint> vector, Vector64<ulong> indices);
    public static Vector64<float> ShuffleNative(Vector64<float> vector, Vector64<int> indices);
    public static Vector64<double> ShuffleNative(Vector64<double> vector, Vector64<ulong> indices);
}

API Usage

-Vector128<byte> bitMask = Shuffle(bitmapLookup, lowNibbles);
-Vector128<byte> bitPositions = Shuffle(Vector128.Create(0x8040201008040201).AsByte(), 
+Vector128<byte> bitMask = Vector128.ShuffleNative(bitmapLookup, lowNibbles);
+Vector128<byte> bitPositions = Vector128.ShuffleNative(Vector128.Create(0x8040201008040201).AsByte(), highNibbles);

-private static Vector128<byte> Shuffle(Vector128<byte> vector, Vector128<byte> indices)
-{
-    return Ssse3.IsSupported
-        ? Ssse3.Shuffle(vector, indices)
-        : AdvSimd.Arm64.VectorTableLookup(vector, indices);
-}

Alternative Designs

No response

Risks

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    api-approvedAPI was approved in API review, it can be implementedarea-System.Runtime.Intrinsicsin-prThere is an active PR which will close this issue when it is merged

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions