ethereum/solidity

Library methods called only during contract construction are included in the runtime bytecode

sifislag opened this issue · 6 comments

Description

I'm a developer of the gigahorse binary lifter and I was looking at recently deployed contracts for which our tool reports a high number of unreachable blocks. I came across contracts that had lots of dead code in their runtime bytecode.

Mapping these blocks back to their source locations I found that the dead code originated from library methods only called during the contract's construction. A past issue suggests that this is not the intended behavior. In addition private/internal methods declared on the contract or one of its super-contracts and called only during construction are not part of the runtime bytecode.

Environment

  • Compiler version: 0.8.17
  • Target EVM version (as per compiler settings): london
  • Operating system: ubuntu 20.04

Steps to Reproduce

Minmal example:

// Taken from OZ's Address
library MinimalAddress {

    function isContract(address account) internal view returns (bool) {
        uint256 size;
        // solhint-disable-next-line no-inline-assembly
        assembly { size := extcodesize(account) }
        return size > 0;
    }
}

contract HasDeadCode {
    address public impl;

    constructor (address _implementation) {
        impl = _implementation;
        require(MinimalAddress.isContract(_implementation));
    }
}

contract NoDeadCode {
    address public impl;

    constructor (address _implementation) {
        impl = _implementation;
        require(isContract(_implementation));
    }

    function isContract(address account) internal view returns (bool) {
        uint256 size;
        // solhint-disable-next-line no-inline-assembly
        assembly { size := extcodesize(account) }
        return size > 0;
    }
}

The runtime bytecodes of the above without optimizations will be the following (issue persists with --optimize and different numbers of rounds):

HasDeadCode:

6080604052348015600f57600080fd5b506004361060285760003560e01c80638abf607714602d575b600080fd5b60336047565b604051603e919060bb565b60405180910390f35b60008054906101000a900473ffffffffffffffffffffffffffffffffffffffff1681565b600080823b905060008111915050919050565b600073ffffffffffffffffffffffffffffffffffffffff82169050919050565b600060a782607e565b9050919050565b60b581609e565b82525050565b600060208201905060ce600083018460ae565b9291505056fea264697066735822122065d7633b66b5947e7be84f394abbe77d90e56c76bf88b5a9c328bed67df0a0a864736f6c63430008110033

NoDeadCode:

6080604052348015600f57600080fd5b506004361060285760003560e01c80638abf607714602d575b600080fd5b60336047565b604051603e919060a8565b60405180910390f35b60008054906101000a900473ffffffffffffffffffffffffffffffffffffffff1681565b600073ffffffffffffffffffffffffffffffffffffffff82169050919050565b6000609482606b565b9050919050565b60a281608b565b82525050565b600060208201905060bb6000830184609b565b9291505056fea26469706673582212205c8d9794ca9e0e86b2325b1de6b253f76811c7c0367b5286960400eee8116eca64736f6c63430008110033

For contract HasDeadCode block 0x6b implements the isContract library method and is dead code in the bytecode.

Partial dissasembly output:

   0x6b: JUMPDEST  
   0x6c: PUSH1     0x0
   0x6e: DUP1      
   0x6f: DUP3      
   0x70: EXTCODESIZE
   0x71: SWAP1     
   0x72: POP       
   0x73: PUSH1     0x0
   0x75: DUP2      
   0x76: GT        
   0x77: SWAP2     
   0x78: POP       
   0x79: POP       
   0x7a: SWAP2     
   0x7b: SWAP1     
   0x7c: POP       
   0x7d: JUMP      

Same issue is not present in contract NoDeadCode.

Just finally getting around to look at reports that have fallen through the grid so far: I can confirm this and this indeed looks strange.
The runtime assembly of HasDeadCode with --optimize --asm indeed contains a tag for the implementation of isContract, which is not referenced - so it should be eliminated by the libevmasm optimizer, but indeed isn't.

via-IR is not affected, but this is still something that'd be worthwhile to fix on the libevmasm level, depending on what causes it there.

Here's a short reproduction:

library L {
    function destruct() internal {
        selfdestruct(payable(msg.sender));
    }
}
contract C {
    constructor() {
        if (msg.sender == address(0xdead)) {
            L.destruct();
        }
    }
}

solc --bin-runtime --no-cbor-metadata --optimize produces

Binary of the runtime part:
6080604052600080fd5b33ff

$ echo "6080604052600080fd5b33ff" > /tmp/out.bin && evm disasm /tmp/out.bin
6080604052600080fd5b33ff
00000: PUSH1 0x80
00002: PUSH1 0x40
00004: MSTORE
00005: PUSH1 0x00
00007: DUP1
00008: REVERT
00009: JUMPDEST
0000a: CALLER
0000b: SELFDESTRUCT

Here's a brief description on why this is happening:

bool JumpdestRemover::optimise(set<size_t> const& _tagsReferencedFromOutside)

The _tagsReferencedFromOutside will contain a reference to this tag (tag_2) in the following example:

======= /tmp/first.sol:C =======
EVM assembly:
  mstore(0x40, 0x80)
  callvalue
  dup1
  iszero
  tag_1
  jumpi
  0x00
  dup1
  revert
tag_1:
  pop
  not(0xdeac)
  caller
  add
  tag_5
  jumpi
  tag_5
  or(tag_0_2, shl(0x20, tag_6))
  0x20
  shr
  jump	// in
tag_5:
  jump(tag_7)
tag_6:
  caller
  selfdestruct
tag_7:
  dataSize(sub_0)
  dup1
  dataOffset(sub_0)
  0x00
  codecopy
  0x00
  return
stop

sub_0: assembly {
      mstore(0x40, 0x80)
      0x00
      dup1
      revert
    tag_2:
      caller
      selfdestruct
}

There are a couple of ways to deal with this.

  • An elegant fix would be if we can optimize the sequence or(tag_0_2, shl(0x20, tag_6)); 0x20; shr into tag_6. For this, we need to encode that tags have at most 2 / 3 bytes.
  • Fix the codegen so that this or(...) is not generated in this case.

Maybe adding the rule SHL(N, OR(A, B)) => OR(SHL(N, A), SHL(N, B)) might do the trick.

Is this going to be fixed in the next release of Solidity?

Yes, very likely.

Yes, very likely.

Great, thanks.