Replace murmur hash with djb2 hash by lencioni · Pull Request #203 · Khan/aphrodite

lencioni · 2017-03-04T18:33:53Z

In profiling StyleSheet.create, I noticed that much of the time was
spent hashing. So, I found a faster hashing algorithm.

The implementation was taken from:

https://github.com/darkskyapp/string-hash

According to this StackExchange post, this algorithm doesn't have as
good of randomness, but it has about the same percentage of collisions.
I don't think randomness matters for this application, so I think this
is okay.

http://softwareengineering.stackexchange.com/a/145633

Using similar methodology to #202, this appears to make StylSheet.create
~15% faster (~220ms to ~185ms).

cc @ljharb

lencioni · 2017-03-04T18:35:23Z

Also, I didn't look very hard for a faster hashing algorithm, so if you know of one that's even faster, let's try that!

ljharb · 2017-03-04T22:01:52Z

-    }
-    /* eslint-enable no-fallthrough */
+function djb2Hash(str) {
+  let hash = 5381;


what's this magic number?

It was apparently chosen through manual testing as the number that produced fewer collisions and better avalanching.

http://stackoverflow.com/a/10697529/18986
https://groups.google.com/forum/#!msg/comp.lang.c/lSKWXiuNOAk/zstZ3SRhCjgJ
http://stackoverflow.com/a/4825477/18986

lencioni · 2017-03-05T15:34:44Z

JSPerf: https://jsperf.com/murmurhash2-vs-djb2

This implementation of djb2 seems to be about 2x faster than the implementation of MurmurHash2 in Chrome. In Firefox, they seem to be pretty equivalent.

csilvers · 2017-03-06T06:44:15Z

I should comment on this I guess, since I'm the one who pushed for murmurhash. It's a very fast (and good) hash when implemented in C, but maybe not in javascript. I don't have a problem moving to djb2, though it's not a great hash function. (The claim about "same number of collisions" depends on the dataset. I doubt it's true in general, though may be true for the types of strings we're hashing in aphrodite -- that could be tested experimentally, though may not be worth the trouble. After all, if a hash is twice as fast that can pay for a lot of collision-handling code!)

The best thing to do would be to ask the js engine to use their internal hash function, something like hash in python. (It looks like chrome uses jenkinshash -- which is a predecessor to murmurhash and is pretty good -- http://stackoverflow.com/questions/30695564/what-kind-of-hashing-function-algorithm-does-javascript-associative-array-use.) I don't know if javascript exposes that though.

lencioni · 2017-03-06T06:48:43Z

The best thing to do would be to ask the js engine to use their internal hash function

This would be great, but I don't think it is an option. And, if we could do this, it would have to be the same hash function across all browsers to ensure that the hash is the same from server -> client.

ljharb · 2017-03-06T06:56:36Z

JS definitely doesn't expose anything like that.

xymostech

One small change, and this LGTM! Echoing what craig said, I'm a little sad that we're switching to a worse hash, but for now since we're also prepending the name that you provide, the chance of collisions is very very low, so I'm happy with this.

In the future, we were thinking about a "prod" mode that would exclude the names and would also hash all of the concatenated class names together into one hash (e.g. "1a81283-o_O-19a238" -> "282ac923") which we might want to do better testing and seeing which hash we want.

xymostech · 2017-03-08T18:46:20Z

 // ordering of objects. Ben Alpert says that Facebook depends on this, so we
 // can probably depend on this too.
-export const hashObject = (object /* : ObjectMap */) /* : string */ => murmurhash2_32_gc(JSON.stringify(object));
+export const hashObject = (object /* : ObjectMap */) /* : number */ => stringHash(JSON.stringify(object));


Could you stringify the output here in the same way that we did in murmurhash? I think a .toString(36) at the end here is all that's needed.

Done! I also profiled this and it seems to be equivalent to not converting to a string.

In profiling StyleSheet.create, I noticed that much of the time was spent hashing. So, I found a faster hashing algorithm. The implementation was taken from: https://github.com/darkskyapp/string-hash According to this StackExchange post, this algorithm doesn't have as good of randomness, but it has about the same percentage of collisions. I don't think randomness matters for this application, so I think this is okay. http://softwareengineering.stackexchange.com/a/145633 Using similar methodology to Khan#202, this appears to make StylSheet.create ~15% faster (~220ms to ~185ms).

This is where I copied the code for this algorithm from, seems like we might as well just bring in the dependency for it.

This was replaced with a different hashing algorithm by #203, so I don't think we really need this note anymore.

lencioni force-pushed the hash branch 2 times, most recently from 64df84d to d37a81a Compare March 4, 2017 18:45

ljharb reviewed Mar 4, 2017

View reviewed changes

lencioni force-pushed the hash branch from d37a81a to 8cbfc3a Compare March 4, 2017 22:53

lencioni force-pushed the hash branch from 8cbfc3a to 4866e61 Compare March 7, 2017 17:34

xymostech approved these changes Mar 8, 2017

View reviewed changes

lencioni and others added 2 commits March 8, 2017 12:39

Depend on string-hash for djb2 hashing algorithm

c3e9fb2

This is where I copied the code for this algorithm from, seems like we might as well just bring in the dependency for it.

lencioni force-pushed the hash branch from 4866e61 to c3e9fb2 Compare March 8, 2017 20:39

xymostech merged commit 21ef03b into Khan:master Mar 8, 2017

lencioni deleted the hash branch July 20, 2017 22:25

lencioni added a commit that referenced this pull request Sep 12, 2017

Remove note about murmurhash license from readme

86135d6

This was replaced with a different hashing algorithm by #203, so I don't think we really need this note anymore.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace murmur hash with djb2 hash#203

Replace murmur hash with djb2 hash#203
xymostech merged 2 commits into
Khan:masterfrom
lencioni:hash

lencioni commented Mar 4, 2017 •

edited

Loading

Uh oh!

lencioni commented Mar 4, 2017

Uh oh!

ljharb Mar 4, 2017

Uh oh!

lencioni Mar 4, 2017

Uh oh!

lencioni commented Mar 5, 2017

Uh oh!

csilvers commented Mar 6, 2017

Uh oh!

lencioni commented Mar 6, 2017

Uh oh!

ljharb commented Mar 6, 2017

Uh oh!

xymostech left a comment

Uh oh!

xymostech Mar 8, 2017

Uh oh!

lencioni Mar 8, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

lencioni commented Mar 4, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lencioni commented Mar 4, 2017

Uh oh!

ljharb Mar 4, 2017

Choose a reason for hiding this comment

Uh oh!

lencioni Mar 4, 2017

Choose a reason for hiding this comment

Uh oh!

lencioni commented Mar 5, 2017

Uh oh!

csilvers commented Mar 6, 2017

Uh oh!

lencioni commented Mar 6, 2017

Uh oh!

ljharb commented Mar 6, 2017

Uh oh!

xymostech left a comment

Choose a reason for hiding this comment

Uh oh!

xymostech Mar 8, 2017

Choose a reason for hiding this comment

Uh oh!

lencioni Mar 8, 2017

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

lencioni commented Mar 4, 2017 •

edited

Loading