ethereum/solidity

Inconsistent bytecode created for identical contract input using standard-json compiler input

epheph opened this issue ยท 26 comments

Description

During deployment of Augur, one of the contracts would not verify on Etherscan properly: the deployment process created one bytecode for "ZeroXTrade", while giving the Standard Json to Etherscan and having them compile created bytecode with VERY slightly different bytecode (of different lengths!).

Due to past issues with contract verification, Augur flattens all contracts prior to deployment. The Standard Json that is provided to solidity lists many contracts to be compiled, but all of them are completely flattened and self-contained (with significant code-duplication between them).

The issue is that the bytecode created for a self-contained contract changes depending on what OTHER self-contained contracts are compiled at the same time. In the example below, I have linked two different inputs that are identical except for an additional self-contained contract being compiled at the same time which demonstrates this issue.

Environment

  • Compiler version: 0.5.17
  • Target EVM version (as per compiler settings): 0.5.17
  • Framework/IDE (e.g. Truffle or Remix): n/a
  • EVM execution environment / backend / blockchain client:
  • Operating system: Linux

Steps to Reproduce

standard-json examples that show the issue:

https://gist.github.com/epheph/20428a1ff7163bce1d47be7d7b623344

The only difference between these two files is the "Augur.sol" section (which jq is discarding the output of).

If you run each of these through solc 0.5.17:

$ cat 1-contract.json  | solc-0.5.17 --standard-json \
      | jq -r '.contracts["trading/ZeroXTrade.sol"].ZeroXTrade.evm.bytecode.object' | egrep -o '^.{64}'
60806040526000805461ffff1916905534801561001b57600080fd5b506147c2

$ cat 2-contracts.json  | solc-0.5.17 --standard-json \
     | jq -r '.contracts["trading/ZeroXTrade.sol"].ZeroXTrade.evm.bytecode.object' | egrep -o '^.{64}'
60806040526000805461ffff1916905534801561001b57600080fd5b506147d1

Look at the last characters, c2 vs d1. There are many other slight differences, but this is the first.

ZeroXTrade.sol makes NO reference to Augur.sol, so the fact that Augur.sol is compiled in the same solidity run should not change the ZeroXTrade's bytecode.

/cc @nuevoalex

Thanks for reporting this!

Looking at the assembly output, two things are interesting:

  • the source references to internal routines are different, which means that the internal routines themselves are probably different
  • the differences in generated code look like different decisions being made in the optimizer

To fix this properly, we have to reset the yul string repository between contracts, which would invalidate the inline assembly part of the AST which is stored in parsed form and thus uses yul strings.

Actually the difference is only in the internal routines, so resetting the yul string repository might be all that is needed.

It turns out that the problem is that AST IDs are encoded into function names which leads to the optimizing handling them in a different order resulting in a different way functions are inlined.

Current plan of attack:

  • rename all identifiers to look nicer (we wanted to do that anyway) and remove AST IDs in the process
  • modify the function inliner such that it inlines functions in an order that does not depend (very much) on the names of the functions: Start with leaves of the call graph ( #9314 ) and use the block hasher to make it deterministic

This bug has really big implications for Buidler's new compilation pipeline. We are now forced to disable some optimizations we've been working on.

Can someone point to the first version of solc with this problem? That would be incredibly valuable.

The problem should be present whenever you have some Yul code and the yul optimizer is activated. It got activated together with the regular optimizer starting from 0.6.0. You have Yul code whenever you use ABIEncoderV2. I think there are also some other situations here yul code is generated, but it might not have that problem.

Actually I think this only happens if you have multiple files in one compilation unit that contain structs or contracts with the same name. So if you don't just flatten everything ( ;) ), you should be rather safe. The reason is that the names of the Yul functions always have the data type name and its AST ID. If the data type name is already unique, a change in the AST ID should not matter much.

So if you don't just flatten everything ( ;) ), you should be rather safe.

I think that's not always true. I believe our new compilation pipeline can easily trigger this bug.

Here's an example of what we were planning to do, and why it would fail:

  • Suppose you have files A.sol, B.sol and C.sol. B.sol imports A.sol.
  • Starting with a clean project, you'd compile A.sol, B.sol and C.sol together.
  • Then, you modify B.sol and it gets compiled alongside A.sol.
  • The user then deploys each a contract from B.sol, and forgets to verify it, nor is willing to commit the compilation output to git. Note that this is extremely common. I can't stress enough how common this is.
  • For some reason, like migrating to another pc, the project's cache/artifacts get cleared.
  • The user now wants to verify the contract from B.sol, so they compile the project. i.e. A.sol, B.sol and C.sol together.
  • B.sol's output may be different, and if so, the verification would fail.

@alcuadrado maybe I didn't understand, but are you saying that a user deploys a contract from a source that is not stored permanently and then cannot verify the contract? If this is so extremely common, then deployment tools should make sure they only work from a clean git repo that is synced to origin or even better, store the sources and the metadata somewhere.

Or did I misunderstand you - I don't see how this is related to importing or flattening, for example.

@chriseth the sources (meaning, the Solidity files) are stored permanently. What is not stored permanently is the JSON input used by solc. In the first scenario, this JSON had the contracts [A, B, C], but in the second scenario it only had [B, C].

The point is that, because of this bug, this optimization (ignoring A in a re-compilation because it doesn't affect B and C) cannot be done. Does that make sense?

@fvictorio as I said, my assumption is that this bug does not manifest itself as long as the files do not contain structs or contracts of the same name, but I will try to verify this assumption.

In general, the optimizer settings, file names, remappings and so on are crucial for verification and thus have to be stored. It would be great if deployment tools could help store the metadata file. If the metadata file is not requested as one of the output artefacts, then it is very much more difficult to source-verify a contract.

Oh, yes, I forgot to add that detail: in that example, files A and B have contracts with the same name (and optimization is enabled, etc.) I was able to reproduce this bug that way.

So the thing we want to do is to be able to make the optimization of only compiling B and C if they were modified but A wasn't. With this bug, we won't be able to do that: we'll always need to include A even if it's not necessary. I think the point @alcuadrado was making is that this is an example of this bug manifesting itself while there's no flattening involved and limiting a potentially useful feature.

I agree that this, in theory, this bug could be worked around by the deployment tool, but that doesn't change our situation with respect to the compilation pipeline.

@fvictorio ok, you are right, I confused the flattening in the initial example with the main problem of two files sharing identifiers of the same name. I assume that if you don't flatten then this will be a very rare case, but I guess we'll just fix this issue and then we are all happy - as long as everyone uses the latest version of Solidity :)

Disabling the FullInliner results in the same bytecode, so it might be that we only have to fix that component.

@ekpyron I think using the blockHasher does not really help here because it will consider two functions to be different if they call two different but equivalent functions.

It turns out the actual fix is much easier: The function inliner processes functions according to their YulString sorting order. If they are processed based on their name (or position in the source, which is equivalent) instead, the bug is gone, and even the metadata is fully equal.

@chriseth can we assume that this bug will never happen if the optimizer is disabled?

@fvictorio this specific bug needs the yul optimizer to be enabled.

It turns out that if you update the example code to Solidity 0.7.0, the bug is not present. I compared the yul code generated and it seems that there are some functions that are duplicated it 0.5.17: One for "memory" and one for "memory pointer", while the distinction between "pointer" or "not pointer" is only relevant for storage and thus it was removed for memory at some point.

I implemented and verified a fix, but we cannot test it due to lack of example input.

If someone encounters an example that fails for 0.7.0, please report it!

Hi. This bug caused me a lot of trouble recently when I tried to verify some contracts that a colleague had compiled using solc-js 0.6.10 and deployed. I understand that this problem should be fixed with solidity 0.7.x. @chriseth - concerning my previously deployed contracts, should I have any other (security?) concerns with regards to this bug other than that verification might be difficult (if we havent stored the metadata)?

The cause of the bug was just that optimizer steps were applied in a different order to the individual functions, so no reason for concern.

Leaving a note here that this affects Argent wallet contracts that are on solc 0.6.12 and using ABIEncoderV2 pragma. When compiling these as part of the overall set of contracts they generate different bytecode than when compiled on their own. Example bytecode differences for ApprovedTransfer contract:

Compiled as part of larger contracts set
https://gist.github.com/elenadimitrova/9073bc073ee72a7264cafe2982823f6b#file-approvedtransfer_bytecode_compiledwithall

Compiled alone
https://gist.github.com/elenadimitrova/4f58d3795d009a05be2c2f56ca77d97c#file-approvedtransfer_bytecode_compiledalone

Affected contracts are: ApprovedTransfer, TokenExchanger, TransferManager, LimitStorage and RelayerManager.
By compiling ApprovedTransfer and RelayerManager independently we've managed to produce the same bytecode as Etherscan for verification however this wasn't successful for TransferManager and TokenExchanger contracts which makes me think there is a larger issue here. I'd like this issue to be reopened and addressed as part of an update to 0.6 (as it is not an issue with 0.7).

Hi @fvictorio @alcuadrado is this issue solved?
I always had that issue and it means that after developing with hardhat I must compile contracts with truffle before deploy otherwise I won't be able to verify them on etherscan.

For instance if I compile SampleContract in this repository I obtain the following (different) bytecode:

Is there any hardhat configuration I'm missing?

scnale commented

I don't think you can reasonably expect hardhat and truffle to produce the same bytecode since they both implement their own build pipeline. I.e. they define the standard input JSON that is passed to the compiler, which is not necessarily the same.

@vittominacori from what I can tell, the difference is just in the metadata (the last bytes) generated by Hardhat and Truffle. I don't quite understand why they are different though. (Btw, I obtained this by actually compiling your project, because the gist files you shared are super different in length, that can't be right.)

This issue is about a difference in the executable part of the bytecode, so I don't think you are running into a regression.

@vittominacori from what I can tell, the difference is just in the metadata (the last bytes) generated by Hardhat and Truffle. I don't quite understand why they are different though. (Btw, I obtained this by actually compiling your project, because the gist files you shared are super different in length, that can't be right.)

This issue is about a difference in the executable part of the bytecode, so I don't think you are running into a regression.

Yes, sorry. Maybe I had old node modules installed.
I've just executed again with a fresh install and I updated the gist files.
They are still different but now I'm able to verify code on etherscan. Thanks

Here the env

  • Hardhat 2.12.4
  • Truffle v5.7.0 (core: 5.7.0)
  • Ganache v7.5.0
  • Solidity - 0.8.17 (solc-js)
  • Node v16.18.1
  • npm 8.19.2