Consensys/python-solidity-parser

About 1/6 of mainnet contracts can not be parsed using this parser.

Closed this issue · 4 comments

I used smart contract sanctuary to dump all mainnet solidity contracts (32382 contracts at the moment of dumping without removing duplicates).

Then I iterated over it and used your parser. This spreadsheet is list of all contracts that could not be parsed. (detected using exception handler). About 5K of contracts could not be parsed.

Hope it will be helpful.

solidity_parser_mainnet_32382_errors.zip

Thanks @xoredtwice. Is it possible to include the exception error in the file too? I see that it only has the file names.

Thanks for reply @shayanb . I exported just the filenames (including contract address). I will run the script again and share another spreadsheet soon with the exceptions' detail.

P.S.
The last line of python-solidity-parser Readme says: "Update the grammar in ./solidity-antlr4/Solidity.g4 and run the antlr generator script to create the parser classes in solidity_parser/solidity_antlr4."

I am using default grammar for parsing. I just downloaded the package and used:
parser.parse_file(contract)

Maybe that's the problem?

GNSPS commented

Yeah, that's definitely a very dated version of the grammar. Let's change the grammar submodule to the one we're maintaining now @ https://github.com/ConsenSys/solidity-antlr4 😄 🎉

Thanks for your reply.

The problem was with smart contract sanctuary dump. Single quotation characters have been replaced with '. I will add an issue to that project.

After fixing this issue, there are only 65 mainnet solidity contracts can not be parsed with your AST parser. I have not changed the grammar. After a glance over the error, it seems that some of them still have character replacement issues for other characters.

I will update the grammar but 65 errors in 32K is pretty awesome. My application can tolerate it.
Best Solidity Parser ever. Congrats guys. :)

error_logs_32382_65.txt