Skip to content

GetNonRandomizedHashCode produces too many collisions for small strings #92556

Open
@EgorBo

Description

@EgorBo

While I was investigating perf traces for some of our 1P customers, I noticed samples inside Dictionary.FindValue on collision++ I then realized it hits quite of few of them (but not enough to rehash to Marvin since the dictrionary overall is not too big)

I made a small repro - here I create 5-char strings AA000 and then change the last 3 chars (only to [0..z] symbols) to find total number of collisions for this string:

using System.Diagnostics;
using System.Numerics;

var sw = Stopwatch.StartNew();
const int From = '0';
const int To = 'z';
var hashes = new HashSet<int>();
char[] str = "AA000".ToCharArray();
int totalCollisions = 0;

unsafe
{
    fixed (char* pStr = str)
    {
        for (int ch2 = From; ch2 < To; ch2++)
        for (int ch3 = From; ch3 < To; ch3++)
        for (int ch4 = From; ch4 < To; ch4++)
        {
            pStr[2] = (char)ch2;
            pStr[3] = (char)ch3;
            pStr[4] = (char)ch4;

            int hash = GetNonRandomizedHashCode(pStr, str.Length);
            if (!hashes.Add(hash)) 
                totalCollisions++;
        }
    }
}
sw.Stop();
Console.WriteLine($"Collisions found: {totalCollisions} in {sw.Elapsed.TotalSeconds}");


static unsafe int GetNonRandomizedHashCode(char* src, int length)
{
    uint hash1 = (5381 << 16) + 5381;
    uint hash2 = hash1;
    uint* ptr = (uint*)src;
    while (length > 2)
    {
        length -= 4;
        hash1 = (BitOperations.RotateLeft(hash1, 5) + hash1) ^ ptr[0];
        hash2 = (BitOperations.RotateLeft(hash2, 5) + hash2) ^ ptr[1];
        ptr += 2;
    }
    if (length > 0)
        hash2 = (BitOperations.RotateLeft(hash2, 5) + hash2) ^ ptr[0];
    return (int)(hash1 + (hash2 * 1566083941));
}

it found 213120 (!!) unique collisions! And I was only changing last 3 chars to ['0'..'z'] symbols. Examples:

Collision found: AAD4T and AAC4w
Collision found: AAB4P and AAO45
Collision found: AAH4T and AAK43
Collision found: AAd4P and AAc43
Collision found: AAr1p and AAp16
Collision found: AAV4I and AAW4h
Collision found: AAD4T and AAE45
Collision found: AAr1p and AAq1W
Collision found: AAd4P and AAe4q
Collision found: AAF4U and AAC48
...

(for comparison, Marvin found 15)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions