Skip to content

Performance difference between 2 code patterns doing same thing #100493

Open
@DeepakRajendrakumaran

Description

@DeepakRajendrakumaran

Description

Ran into it as part of this PR - #99982 relevant comment thread here - #99982 (comment)

The following 2 Code patterns are logically the same

private static  bool HasMatch2(Vector256<byte> vector)
   {
       return ((vector & Vector256.Create((byte)0x80)) != Vector256<byte>.Zero);
   }
  private static  bool HasMatch3(Vector256<byte> vector)
  {
        return !((vector & Vector256.Create((byte)0x80)).Equals(Vector256<byte>.Zero));
   }

They seem to produce same assembly : https://godbolt.org/z/1rzEcj8ar

The PR referred above uses the pattern in HasMatch3. When I try the pattern in HasMatch2, the performance degrades

How to reproduce

  1. Check out this PR if it's not merged yet
  2. Create and compile the following benchmark on an ICX(I tested on ICX)
using System.Collections.Generic;
using System.Linq;
using System.Runtime.CompilerServices;
using System;
using System.Text;
using System.Diagnostics;

namespace ProfilingDocs
{
    class Program
    {


        private static byte[] _sourceBytes = Enumerable.Repeat((byte)'a', 5120).ToArray();

        static void Main()
        {

             var timer = new Stopwatch();
            timer.Start();

            for (int i = 0; i < 12_000_000; i++)
            {
                GetString();
            }

            timer.Stop();

         TimeSpan timeTaken = timer.Elapsed;
        string foo = "Time taken: " + timeTaken.ToString(@"m\:ss\.fff"); 
        Console.WriteLine(foo);



        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        public static string GetString() => Encoding.UTF8.GetString(_sourceBytes);




    }
}
  1. Run this benchmark with local build of PR
  2. Change the following in PR(https://github.com/dotnet/runtime/pull/99982/files#diff-6b4906abc01dc4699f348f7c1df72e2f640f240aa31ea67cd47642221b2021f5R2204) to

((vector & Vector256.Create((byte)0x80)) != Vector256<byte>.Zero);

  1. Recompile repo and rerun the benchmark

Data

image

Metadata

Metadata

Assignees

No one assigned

    Labels

    area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMItenet-performancePerformance related issue

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions