c3rb3ru5d3d53c/binlex

Problems with function recognition

nofiv opened this issue · 15 comments

nofiv commented

Hello! I wanted to process an OpenSSL library and noticed that the latest version of binlex recognized only a negligible number of functions - 7 meanwhile IDA recognized 1636. I used command binlex -m pe:x86_64 -i <lib_name> | jq -r 'select(.type == ("function"))', am I doing something wrong or is there a bug please?

What version are you using atm?

Can you show the output of :

binlex --version

Cool, yeah we are likely missing some CALL instructions we are getting addresses from in capstone :)

I'll have a look :)

nofiv commented

Great! Thanks

v1.1.1 is not officially released yet :) so added it to the milestone

Are you also able to provide the hash of the file you were testing?

nofiv commented

It's the one in the archive(https://curl.se/windows/dl-7.77.0_1/openssl-1.1.1k_1-win64-mingw.zip) SHA1: ef406228f7694359c5f87e2ee7b4f760dcf160f6

It will be hard to have 100% parody with IDA with two developers working on this but, we can try to do a little better :)

nofiv commented

I understand that matching IDA is out of the question. I just wanted to pinpoint the problem and unfortunately don't have the time to help out with it right now

Well when you have some time to help we would welcome it! 😄

So having a quick look, we need to parse out function exports and add them to the queue for shared libs for linux and DLLs for windows.

So we are sitting now at ~800 functions recognized, still have to do further validation we are populating the export queue correctly, but much better parody than before. Switched over to using lief library and will do the same with ELF shared libs as well.

We investigated using radare2 rlib and sleigh from ghidra as decompilers compared to capstone.

We discovered that making these multi-threaded is not easily possible and we already have decent parody compared to the others except on DLLs or shared libs in Linux.

We concluded with this testing that for the binlex project, it makes more sense to continue using capstone decompiler.

And leverage a library like lief to enable us much easier cross platform binary format parsing to collect exports and other properties to populate our analysis queue.

Populating the queue correctly using lief library now for PE executable, not merged yet but we can close this now.