e-m-b-a/emba

Improve speed of component detection

jblu42 opened this issue · 4 comments

Is your feature request related to a problem? Please describe.
Running the component identification module (S09) takes multiple hours on large firmware. For my tested firmware, P99 detects 5171 files in total, 1229 unique executables (I am assuming the later is used for component identification?) Out of these 1229 only 50 files are identified in the end.

Describe the solution you'd like
Still assuming the detection with "strings" and "grep" is run over all of these 1229 files, I am wondering if there are faster methods, e.g.:
First running a grep with all identifications and only if there is a match, run detailed identification, possible only run identification on the output of the first grep

Describe alternatives you've considered
Perhaps there are other speed ups possible, like avoiding using file to detect the type of binary on the array of files or similar

Priority issue
Are you already a Sponsor? - [N]

Additional context
Don´t know if a speed up is possible or does make sense. On an older server the above detection runs for about 3 hours and is currently the longest of the modules running, so at least I would be happy over a speed up.
I also have not spent a lot of time investigating this, I first wanted throw in this as a suggestion and see if you see any room for improvement.

Probably one speedup would be to pre-generate all strings from all elf files. With this we do not need to generate the strings output every time again.

@jblu42 please test #1006. This PR decreases the runtime of s09 (at least during my tests) massively.

Did a few test runs on friday. (could not do so many large tests runs as they run for a long time).

A large image run decreased from about 3:30 (hours:minutes) to 2:50. For a smaller image there was only a slight decrease in time.

Overall already very good, thanks for the quick help.

with further rewrite of s09 we could probably find more speed in the future. This was a quick shot and it worked not so bad :-)