Problems with function recognition
nofiv opened this issue · 15 comments
Hello! I wanted to process an OpenSSL library and noticed that the latest version of binlex recognized only a negligible number of functions - 7 meanwhile IDA recognized 1636. I used command binlex -m pe:x86_64 -i <lib_name> | jq -r 'select(.type == ("function"))'
, am I doing something wrong or is there a bug please?
What version are you using atm?
Can you show the output of :
binlex --version
Cool, yeah we are likely missing some CALL instructions we are getting addresses from in capstone :)
I'll have a look :)
Great! Thanks
v1.1.1 is not officially released yet :) so added it to the milestone
Are you also able to provide the hash of the file you were testing?
It's the one in the archive(https://curl.se/windows/dl-7.77.0_1/openssl-1.1.1k_1-win64-mingw.zip) SHA1: ef406228f7694359c5f87e2ee7b4f760dcf160f6
It will be hard to have 100% parody with IDA with two developers working on this but, we can try to do a little better :)
I understand that matching IDA is out of the question. I just wanted to pinpoint the problem and unfortunately don't have the time to help out with it right now
Well when you have some time to help we would welcome it! 😄
So having a quick look, we need to parse out function exports and add them to the queue for shared libs for linux and DLLs for windows.
So we are sitting now at ~800 functions recognized, still have to do further validation we are populating the export queue correctly, but much better parody than before. Switched over to using lief
library and will do the same with ELF shared libs as well.
We investigated using radare2
rlib
and sleigh
from ghidra
as decompilers compared to capstone.
We discovered that making these multi-threaded is not easily possible and we already have decent parody compared to the others except on DLLs or shared libs in Linux.
We concluded with this testing that for the binlex
project, it makes more sense to continue using capstone
decompiler.
And leverage a library like lief
to enable us much easier cross platform binary format parsing to collect exports and other properties to populate our analysis queue.
Populating the queue correctly using lief
library now for PE executable, not merged yet but we can close this now.