MerkleProof: Intermediate nodes can be reinterpreted as leaves

It is possible to prove the presence of certain values that aren't actually leaves in the tree.

This is a very slight variant of a well-known problem sometimes called a "second preimage attack" on merkle trees, although in my opinion that is a misleading name. The issue is that you can concatenate the hashes of two internal nodes (resulting in 64 bytes) and provide this as a value to be proved. The smart contract will hash this and attempt to verify it as though it was a normal leaf. If the proof has been shortened appropriately, it will incorrectly be validated.

The very slight variation from the typical description is that in the OZ implementation internal nodes are sorted, although this doesn't change the impact of this issue (which is low in most cases).

I included a test case in https://github.com/OpenZeppelin/openzeppelin-contracts/pull/3090 that demonstrates this issue and adds some documentation warning developers about how to avoid it.

## Impact

The real world impact of this bug is likely small. The first condition for a successful attack is that a contract accepts legitimate leaf values with a length of exactly 64 bytes. For instance, Uniswap's merkle distributor uses the following to hash a leaf:

      bytes32 node = keccak256(abi.encodePacked(index, account, amount));

This is an example of code that is *not* vulnerable. `index` and `amount` are `uint256`s and `account` is an address so the value passed to `keccak256` is `32 + 20 + 32 = 84` bytes long.

On the other hand, suppose the code was written like so:

      bytes32 node = keccak256(abi.encodePacked(index, account, uint96(amount)));

In this case, the leaf value is `32 + 20 + 12 = 64` bytes long so it would potentially be vulnerable to a griefing/spamming attack where a large amount of tokens are sent to some random address, effectively burning them. Alternatively, if `index` was packed into some small type (maybe `uint32`) then it might be possible to burn innocent users' tokens by sending them to random addresses. The length condition could also be satisfied if variable-length values are used, for example: `abi.encodePacked(address, bytes(...))`.

The second condition of a successful attack is to overcome the fact that the attacker has very little control over the invalid values. This is because they are the output of hash functions over data that presumably the attacker cannot influence. If attackers *do* have some influence over values in the tree (maybe they can sign up for an airdrop using arbitrary addresses) then they may be able to "grind" advantageous values to some degree.

However, in the typical case an attacker has to make do with a set of effectively random values. The number of usable inputs scales approximately with the number of leaf values in the tree. An attacker can loop over all possibilities and pick the best one, and there will be more chances of a "good" pair of hashes if the tree is larger. However, this is obviously dominated by the large output range of the 256-bit hash function so is not likely a problem in practice.

## Recommendation

I originally submitted this to Immunefi but they weren't interested so I'm just posting here. It may be worthwhile determining if any contracts have been deployed that could be impacted by this (probably few, if any).

At the very least, users should be warned in the documentation about this issue. Contracts should ensure that there is a "domain separation" between hashing leaf values and intermediate nodes. This can be done either by making sure leaf values are never 64 bytes in length, or by using a hash function other than keccak256 when hashing leaves. Also, `abi.encode()` should be encouraged over `abi.encodePacked()` for several reasons.

My [pull request](https://github.com/OpenZeppelin/openzeppelin-contracts/pull/3090) contains some suggested wording for this. Also feel free to include any part of this description in your documentation, if it helps.

There are some API-level changes that could be made to definitively prevent this. For instance, `verify()` could accept a user-defined value type (https://blog.soliditylang.org/2021/09/27/user-defined-value-types/) that wraps `bytes32` and can only be created by a keccak256-wrapper that also checks the input length is not 64-bytes. That said, I think leaving the responsibility of preventing this to the developer is fine, as long as there is appropriate documentation describing how to do so.

## References

* https://www.rfc-editor.org/rfc/rfc6962.html#section-2.1
* https://hoytech.github.io/blockchain-storage/lesson2/index.html#/40


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MerkleProof: Intermediate nodes can be reinterpreted as leaves #3091

Impact

Recommendation

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

MerkleProof: Intermediate nodes can be reinterpreted as leaves #3091

Description

Impact

Recommendation

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions