Open
Description
While I was investigating perf traces for some of our 1P customers, I noticed samples inside Dictionary.FindValue
on collision++
I then realized it hits quite of few of them (but not enough to rehash to Marvin since the dictrionary overall is not too big)
I made a small repro - here I create 5-char strings AA000
and then change the last 3 chars (only to [0..z]
symbols) to find total number of collisions for this string:
using System.Diagnostics;
using System.Numerics;
var sw = Stopwatch.StartNew();
const int From = '0';
const int To = 'z';
var hashes = new HashSet<int>();
char[] str = "AA000".ToCharArray();
int totalCollisions = 0;
unsafe
{
fixed (char* pStr = str)
{
for (int ch2 = From; ch2 < To; ch2++)
for (int ch3 = From; ch3 < To; ch3++)
for (int ch4 = From; ch4 < To; ch4++)
{
pStr[2] = (char)ch2;
pStr[3] = (char)ch3;
pStr[4] = (char)ch4;
int hash = GetNonRandomizedHashCode(pStr, str.Length);
if (!hashes.Add(hash))
totalCollisions++;
}
}
}
sw.Stop();
Console.WriteLine($"Collisions found: {totalCollisions} in {sw.Elapsed.TotalSeconds}");
static unsafe int GetNonRandomizedHashCode(char* src, int length)
{
uint hash1 = (5381 << 16) + 5381;
uint hash2 = hash1;
uint* ptr = (uint*)src;
while (length > 2)
{
length -= 4;
hash1 = (BitOperations.RotateLeft(hash1, 5) + hash1) ^ ptr[0];
hash2 = (BitOperations.RotateLeft(hash2, 5) + hash2) ^ ptr[1];
ptr += 2;
}
if (length > 0)
hash2 = (BitOperations.RotateLeft(hash2, 5) + hash2) ^ ptr[0];
return (int)(hash1 + (hash2 * 1566083941));
}
it found 213120
(!!) unique collisions! And I was only changing last 3 chars to ['0'..'z']
symbols. Examples:
Collision found: AAD4T and AAC4w
Collision found: AAB4P and AAO45
Collision found: AAH4T and AAK43
Collision found: AAd4P and AAc43
Collision found: AAr1p and AAp16
Collision found: AAV4I and AAW4h
Collision found: AAD4T and AAE45
Collision found: AAr1p and AAq1W
Collision found: AAd4P and AAe4q
Collision found: AAF4U and AAC48
...
(for comparison, Marvin found 15)