Replace murmur hash with djb2 hash#203
Conversation
|
Also, I didn't look very hard for a faster hashing algorithm, so if you know of one that's even faster, let's try that! |
64df84d to
d37a81a
Compare
| } | ||
| /* eslint-enable no-fallthrough */ | ||
| function djb2Hash(str) { | ||
| let hash = 5381; |
There was a problem hiding this comment.
It was apparently chosen through manual testing as the number that produced fewer collisions and better avalanching.
http://stackoverflow.com/a/10697529/18986
https://groups.google.com/forum/#!msg/comp.lang.c/lSKWXiuNOAk/zstZ3SRhCjgJ
http://stackoverflow.com/a/4825477/18986
|
JSPerf: https://jsperf.com/murmurhash2-vs-djb2 This implementation of djb2 seems to be about 2x faster than the implementation of MurmurHash2 in Chrome. In Firefox, they seem to be pretty equivalent. |
|
I should comment on this I guess, since I'm the one who pushed for murmurhash. It's a very fast (and good) hash when implemented in C, but maybe not in javascript. I don't have a problem moving to djb2, though it's not a great hash function. (The claim about "same number of collisions" depends on the dataset. I doubt it's true in general, though may be true for the types of strings we're hashing in aphrodite -- that could be tested experimentally, though may not be worth the trouble. After all, if a hash is twice as fast that can pay for a lot of collision-handling code!) The best thing to do would be to ask the js engine to use their internal hash function, something like |
This would be great, but I don't think it is an option. And, if we could do this, it would have to be the same hash function across all browsers to ensure that the hash is the same from server -> client. |
|
JS definitely doesn't expose anything like that. |
xymostech
left a comment
There was a problem hiding this comment.
One small change, and this LGTM! Echoing what craig said, I'm a little sad that we're switching to a worse hash, but for now since we're also prepending the name that you provide, the chance of collisions is very very low, so I'm happy with this.
In the future, we were thinking about a "prod" mode that would exclude the names and would also hash all of the concatenated class names together into one hash (e.g. "1a81283-o_O-19a238" -> "282ac923") which we might want to do better testing and seeing which hash we want.
| // ordering of objects. Ben Alpert says that Facebook depends on this, so we | ||
| // can probably depend on this too. | ||
| export const hashObject = (object /* : ObjectMap */) /* : string */ => murmurhash2_32_gc(JSON.stringify(object)); | ||
| export const hashObject = (object /* : ObjectMap */) /* : number */ => stringHash(JSON.stringify(object)); |
There was a problem hiding this comment.
Could you stringify the output here in the same way that we did in murmurhash? I think a .toString(36) at the end here is all that's needed.
There was a problem hiding this comment.
Done! I also profiled this and it seems to be equivalent to not converting to a string.
In profiling StyleSheet.create, I noticed that much of the time was spent hashing. So, I found a faster hashing algorithm. The implementation was taken from: https://github.com/darkskyapp/string-hash According to this StackExchange post, this algorithm doesn't have as good of randomness, but it has about the same percentage of collisions. I don't think randomness matters for this application, so I think this is okay. http://softwareengineering.stackexchange.com/a/145633 Using similar methodology to Khan#202, this appears to make StylSheet.create ~15% faster (~220ms to ~185ms).
This is where I copied the code for this algorithm from, seems like we might as well just bring in the dependency for it.
This was replaced with a different hashing algorithm by #203, so I don't think we really need this note anymore.
In profiling StyleSheet.create, I noticed that much of the time was
spent hashing. So, I found a faster hashing algorithm.
The implementation was taken from:
https://github.com/darkskyapp/string-hash
According to this StackExchange post, this algorithm doesn't have as
good of randomness, but it has about the same percentage of collisions.
I don't think randomness matters for this application, so I think this
is okay.
http://softwareengineering.stackexchange.com/a/145633
Using similar methodology to #202, this appears to make StylSheet.create
~15% faster (~220ms to ~185ms).
cc @ljharb