Skip to content

C++: Use little-endian load for std::hash#561

Merged
chfast merged 2 commits intomasterfrom
optimize_cpp_hash
Feb 16, 2021
Merged

C++: Use little-endian load for std::hash#561
chfast merged 2 commits intomasterfrom
optimize_cpp_hash

Conversation

@chfast
Copy link
Copy Markdown
Member

@chfast chfast commented Nov 2, 2020

This replaces the big-endian loads with little-endian loads in hash functions for evmc::address and evmc::bytes32.
Performance improvements are significant.

hash_<evmc::bytes32, hash<evmc::bytes32>>_mean                           -0.2973         -0.2973          2335          1641          2335          1641
hash_<evmc::bytes32, noinline_hash<evmc::bytes32>>_mean                  -0.1559         -0.1559          3045          2571          3045          2571
hash_<evmc::address, hash<evmc::address>>_mean                           -0.4009         -0.4009          1323           793          1323           793
hash_<evmc::address, noinline_hash<evmc::address>>_mean                  -0.2762         -0.2762          1955          1415          1955          1415

Originally, I also tried much simpler word folding fold(a, b): 3*a + b. These hashes does not look very random any more, and hash of zero is zero. Furthermore, it only improves performance (over little-endian version) for hash functions inlined in a loop, what is probably not the case for hash maps.

hash_<evmc::bytes32, hash<evmc::bytes32>>_mean                           -0.1432         -0.1432          1641          1406          1641          1406
hash_<evmc::bytes32, noinline_hash<evmc::bytes32>>_mean                  -0.0006         -0.0006          2571          2569          2571          2569
hash_<evmc::address, hash<evmc::address>>_mean                           -0.1896         -0.1896           793           643           793           643
hash_<evmc::address, noinline_hash<evmc::address>>_mean                  -0.0087         -0.0087          1415          1403          1415          1403

We can revisit more optimizations here, but we should build some hashmap performance testing up front (e.g. see https://stackoverflow.com/a/62345875/725174).

@chfast chfast force-pushed the optimize_cpp_hash branch from 6bad478 to 8020d35 Compare November 2, 2020 12:57
@chfast
Copy link
Copy Markdown
Member Author

chfast commented Nov 2, 2020

TODO: std::hash unit tests are pretty bad - changing BE load to LE produces the same value for the given test cases.

@chfast chfast requested review from axic, gumb0 and yperbasis November 2, 2020 14:00
Copy link
Copy Markdown
Contributor

@yperbasis yperbasis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FNV calls take a negligible fraction of Silkworm execution, so this change probably won't make a difference to the total block execution time.

struct hash<evmc::address>
{
/// Hash operator using FNV1a-based folding.
/// Hash operator using (3a + b) folding of the address "words".
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is 3a + b? Some homebrew hashing?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kindof. We have this progression of options:

  1. Fold all words with XOR.
  2. Fold all words with ADD. A bit better than XOR because discards less information. But also symmetric.
  3. "Classic" multiply by prime/odd number and add: fold(a,b) { return 3*a + b }.

The 3 is used because has the same performance as 1 and 2. The multiply is done by lea instruction and the throughput is the same because of the executing multiple instructions in the same time. I.e. latency of "multiply" is hidden.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would 1) and 2) work? The same bytes in a different order would result in the same hash. Or do you mean not only xor/add, but some shifting/etc.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just word0 ^ word1 ^ word2 ^ word3. Similarly add.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remember correctly, we discussed that this is only used as a quick lookup, but the actual data is then compared at a match, so clashes do not matter.

Base automatically changed from optimize_cpp_compare to master November 2, 2020 19:45
@chfast chfast force-pushed the optimize_cpp_hash branch from 8020d35 to d0a8f5b Compare November 2, 2020 20:42
@codecov-io
Copy link
Copy Markdown

Codecov Report

Merging #561 into master will decrease coverage by 0.00%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #561      +/-   ##
==========================================
- Coverage   91.31%   91.30%   -0.01%     
==========================================
  Files          22       22              
  Lines        3119     3118       -1     
==========================================
- Hits         2848     2847       -1     
  Misses        271      271              

@chfast
Copy link
Copy Markdown
Member Author

chfast commented Nov 2, 2020

FNV calls take a negligible fraction of Silkworm execution, so this change probably won't make a difference to the total block execution time.

Using FNV is pretty solid. Would be nice to confirm if your hashmap is using std::hash and benchmark this change with silkworm.

@yperbasis
Copy link
Copy Markdown
Contributor

FNV calls take a negligible fraction of Silkworm execution, so this change probably won't make a difference to the total block execution time.

Using FNV is pretty solid. Would be nice to confirm if your hashmap is using std::hash and benchmark this change with silkworm.

I've checked and the hashmap does use std::hash. There's a tiny performance gain: 0.1 h win out of 16.5 h of executing the first 11M blocks.

Copy link
Copy Markdown
Member

@axic axic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm indifferent on this. At least the first commit for adding more tests should be merged.

@chfast chfast force-pushed the optimize_cpp_hash branch 2 times, most recently from a0fdd25 to a138c51 Compare February 16, 2021 10:10
@chfast
Copy link
Copy Markdown
Member Author

chfast commented Feb 16, 2021

In the final version there is only switch to little-endian loading. See the updated description.

@chfast chfast merged commit b606331 into master Feb 16, 2021
@chfast chfast deleted the optimize_cpp_hash branch February 16, 2021 10:35
@chfast chfast changed the title C++: Use simpler 3a + b folding in std::hash C++: Use little-endian load for std::hash Feb 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants