transmissions11/solmate

Optimize merkle proof by having the arguments in calldata

hrkrshnn opened this issue · 6 comments

https://github.com/Rari-Capital/solmate/blob/f878f20ecd6a23dd4105ee3eed7291b6311ce1fb/src/utils/MerkleProof.sol#L9

Using calldata is ideal here and would save a decent chunk of gas.

https://github.com/Rari-Capital/solmate/blob/f878f20ecd6a23dd4105ee3eed7291b6311ce1fb/src/utils/MerkleProof.sol#L20-L36

But this means that the assembly routine would be more complex. Consider the following instead:

  1. Use a regular for loop to load the value from calldata.
  2. Have a subroutine called cheapKeccak, which uses scratch space for keccak that would do the same computation.

Saves around 200 gas if there are 5 elements in proof.

We can make a copy of the function just for calldata. Should we?
If @transmissions11 is ok with some code duplication, then let's go.

function merkleVerifyAssemblyCalldata(
    bytes32[] calldata proof,
    bytes32 root,
    bytes32 leaf
) internal pure returns (bool isValid) {
    assembly {
        let computedHash := leaf

        // Get the memory start location of the first element in the proof array.
        let data := proof.offset

        // Iterate over proof elements to compute root hash.
        for {
            // Left shift by 5 is equivalent to multiplying by 0x20.
            let end := add(data, shl(5, proof.length))
        } lt(data, end) {
            data := add(data, 0x20)
        } {
            let loadedData := calldataload(data)
            // Slot of `computedHash` in scratch space.
            // If the condition is true: 0x20, otherwise: 0x00.
            let scratch := shl(5, gt(computedHash, loadedData))
            
            // Store elements to hash contiguously in scratch space.
            // Scratch space is 64 bytes (0x00 - 0x3f) and both elements are 32 bytes.
            mstore(scratch, computedHash)
            mstore(xor(scratch, 0x20), loadedData)
            computedHash := keccak256(0x00, 0x40)
        }
        isValid := eq(computedHash, root)
    }
}

tryna think of a case where u need the proof to be memory? im totally cool with just supporting calldata

but this seems like a really good change, 200 gas is sick

@Vectorized Do you really have to use calldataload(..) in assembly? Do you have a benchmark for just proof[idx]?

@hrkrshnn I tried proof[idx], but the assembly cannot compile.

Here are some numbers:

[PASS] testValidProofSupplied() (gas: 2464)
[PASS] testValidProofSuppliedCalldata() (gas: 2262)
[PASS] testVerifyEmptyMerkleProofSuppliedLeafAndRootDifferent() (gas: 1733)
[PASS] testVerifyEmptyMerkleProofSuppliedLeafAndRootDifferentCalldata() (gas: 1565)
[PASS] testVerifyEmptyMerkleProofSuppliedLeafAndRootSame() (gas: 1727)
[PASS] testVerifyEmptyMerkleProofSuppliedLeafAndRootSameCalldata() (gas: 1537)
[PASS] testVerifyInvalidProofSupplied() (gas: 2460)
[PASS] testVerifyInvalidProofSuppliedCalldata() (gas: 2304)

For a fair comparison between the two functions, all tests use calls,
which explains why the numbers for the old tests have increased.

Looks like there may be a very niche use case for memory https://github.com/fx-portal/contracts/blob/baed24d22178201bca33140c303e0925661ec0ac/contracts/tunnel/FxBaseRootTunnel.sol

But for their case, I think it's best if they have a custom implementation, which they did.