Open
Description
Description
Ran into it as part of this PR - #99982 relevant comment thread here - #99982 (comment)
The following 2 Code patterns are logically the same
private static bool HasMatch2(Vector256<byte> vector)
{
return ((vector & Vector256.Create((byte)0x80)) != Vector256<byte>.Zero);
}
private static bool HasMatch3(Vector256<byte> vector)
{
return !((vector & Vector256.Create((byte)0x80)).Equals(Vector256<byte>.Zero));
}
They seem to produce same assembly : https://godbolt.org/z/1rzEcj8ar
The PR referred above uses the pattern in HasMatch3
. When I try the pattern in HasMatch2
, the performance degrades
How to reproduce
- Check out this PR if it's not merged yet
- Create and compile the following benchmark on an ICX(I tested on ICX)
using System.Collections.Generic;
using System.Linq;
using System.Runtime.CompilerServices;
using System;
using System.Text;
using System.Diagnostics;
namespace ProfilingDocs
{
class Program
{
private static byte[] _sourceBytes = Enumerable.Repeat((byte)'a', 5120).ToArray();
static void Main()
{
var timer = new Stopwatch();
timer.Start();
for (int i = 0; i < 12_000_000; i++)
{
GetString();
}
timer.Stop();
TimeSpan timeTaken = timer.Elapsed;
string foo = "Time taken: " + timeTaken.ToString(@"m\:ss\.fff");
Console.WriteLine(foo);
}
[MethodImpl(MethodImplOptions.NoInlining)]
public static string GetString() => Encoding.UTF8.GetString(_sourceBytes);
}
}
- Run this benchmark with local build of PR
- Change the following in PR(https://github.com/dotnet/runtime/pull/99982/files#diff-6b4906abc01dc4699f348f7c1df72e2f640f240aa31ea67cd47642221b2021f5R2204) to
((vector & Vector256.Create((byte)0x80)) != Vector256<byte>.Zero);
- Recompile repo and rerun the benchmark