second-state/smart-contract-search-engine

Create a more robust abiSha3 hash

tpmccallum opened this issue · 9 comments

It is possible that ABIs, which are uploaded to the search engine, could have their functions and events in different orders.
Whilst the check for ABI compatibility would still pass, the abiSha3 hash would be different.

We already strip out tabs and extra spaces etc. to ensure that hashes are more consistent and reliable, however it would be a really good idea to also sort the keys in the ABI data structure so that we can formulate a more robust ABI hash.

Let Atlas know when the new abiSha3 hashes are created as this will change the frontend display lookup objects.

This is now done by sorting the keys in the data structure and then hashing the clean (no tabs, extra spaces or return characters) data using sha3.

    def shaAnAbiWithOrderedKeys(self, _theAbi):
        theAbiHash = str(self.web3.toHex(self.web3.sha3(text=json.dumps(_theAbi, sort_keys=True))))
        return theAbiHash

Example of this include:

0x2b5710e2cf7eb7c9bd50bfac8e89070bdfed6eb58f0c26915f034595e5443286

After further testing it turns out that ordering the keys is not entirely robust. Please consider the following example where two different valid ABIs (which are also valid JSON) can produce two different Sha3 hashes, due to the values being out of order.

import json

string1 = '''{
	"constant": false,
	"inputs": [{
			"name": "value",
			"type": "uint256"
		},
		{
			"name": "spender",
			"type": "address"
		}
	]
}'''

string2 = '''{
	"constant": false,
	"inputs": [{
			"name": "spender",
			"type": "address"
		},
		{
			"name": "value",
			"type": "uint256"
		}
	]
}'''

json1 = json.loads(string1)
json2 = json.loads(string2)

output1 = json.dumps(json1, sort_keys=True)
output2 = json.dumps(json2, sort_keys=True)

print(output1)
# {"constant": false, "inputs": [{"name": "value", "type": "uint256"}, {"name": "spender", "type": "address"}]}
print(output2)
# {"constant": false, "inputs": [{"name": "spender", "type": "address"}, {"name": "value", "type": "uint256"}]}

# Note the values of type remain different when comparing 1 and 2. These are both valid JSON and valid Ethereum ABI. This is not robust enough for deterministic hashes of ABIs given that a user can upload an out of order ABI as shown above.  

This occurs when an internal list can contain a repeated key such as "name". The json.dumps does not re-work the list.
JSON can't have duplicate keys so one would assume that sorting by keys is robust. However single entries in internal lists can each have duplicate keys which will remain out of order if the JSON is created in that way.

Given the fact that each input of a given smart contract's function can not share the same name, we can go ahead and sort the list using the following code.

from operator import itemgetter
list.sort(key=itemgetter("name"))

This code sorts the internals of the list as part of a dynamic for loop.

    def sortInternalListsInJsonObject(self, _json):
        for listItem in _json:
            for k, v in listItem.items():
                if type(v) not in (str, bool, int) and len(v) > 1:
                    if type(v[0]) is dict:
                        v.sort(key=itemgetter("name"))
                    else:
                        v.sort()
        return _json

The overall result being what we want.

_json
# Returns 
# {'constant': False, 'inputs': [{'name': 'spender', 'type': 'address'}, {'name': 'value', 'type': 'uint256'}]}

Can now confirm that the new code is able to create an ordered ABI with a deterministic hash

Reading configuration file
Master index: allercchecker
Common index: ercchecker
Abi index: abiercchecker
Bytecode index: bytecodeercchecker
Blockchain RPC: https://mainnet.infura.io/v3/fdaf79947fba404ab08cc096f20e12ea
ElasticSearch Endpoint: search-cmtsearch-l72er2gp2gxdwazqb5wcs6tskq.ap-southeast-2.es.amazonaws.com
0xf184e89595256b4eff3b1f0a66570fb6944e04eab99156d3f7bfe5b7c082c628
0xf184e89595256b4eff3b1f0a66570fb6944e04eab99156d3f7bfe5b7c082c628

Code for the above test is here