Paint traced code that doesn't belong to IDA defined functions

Question

Paint traced code that doesn't belong to IDA defined functions

gdbinit opened this issue 3 years ago · 3 comments

Hi,

Lighthouse seems unable to paint instructions that were executed but don't belong to functions IDA managed to create.

From what I can see _optimize_coverage_data does the set intersection between the basic blocks addresses read from the trace file (drcov from DynamoRIO in my case) and the metadata cache based on IDA functions list.
So any valid addresses that don't exist in a IDA function don't belong to the Lighthouse coverage data and will not be painted in the disassembly.

This is super useful against obfuscated, crypted and other targets.

Is there a easy way to achieve this in current code base or needs some hacks such as creating pseudo functions or something else? There seems to be a TODO in _optimize_coverage_data somewhat related to this.

Thanks for this plugin, it's super useful!

fG!

Answer 1 · 2021-09-29T01:37:44.000Z

Lighthouse seems unable to paint instructions that were executed but don't belong to functions IDA managed to create.
...
So any valid addresses that don't exist in a IDA function don't belong to the Lighthouse coverage data and will not be painted in the disassembly.

This is correct, unfortunately.

It's probably one of the sillier 'sharp edges' in lighthouse and is something I have been putting off 'fixing' (supporting?) for a long time. I think it's time to finally do the necessary refactoring to support it.

This is super useful against obfuscated, crypted and other targets.

To be completely honest, I actually do not recommend Lighthouse for malware/obfuscated/unpacking/metamorphic RE tasks for exactly these kind of reasons.

I think it's too easy for 'coverage' to become confusing or even misleading unless you have a good idea what the malicious runtime was doing. That's assuming your coverage data even makes sense anymore, against the static context you see within IDA.

At the end of the day, I wrote lighthouse to improve introspection on opaque binary-only fuzzing tasks. Its ability to handle anything more nefarious is a bonus :-P

Is there a easy way to achieve this in current code base or needs some hacks such as creating pseudo functions or something else? There seems to be a TODO in _optimize_coverage_data somewhat related to this.

Thanks for this plugin, it's super useful!

Honestly, I think it'll take a bunch of small strategic changes to do this correctly (again, part of why I was putting it off). I'll see if I can whip up something tonight. Hopefully the changes don't spiral out of control, but we'll see.

Thanks again for the kind words, and for carefully articulating your interest in this issue.

Answer 2 · 2021-09-29T21:22:48.000Z

Yes, it's totally a "I know what I'm doing and what I want" option given the possible problems with that kind of code and dynamic tracing. But it can be quite useful as first approach to it.

I removed the set intersection with instructions and kinda worked as ugly hack and approach to this target. Faster than my Lighthouse clone that just paints directly from trace log (just ugly to parse that drcov header in C, can't even understand why that format ehhe).

Thanks!

Answer 3 · 2021-10-06T03:08:17.000Z

Yes, it's totally a "I know what I'm doing and what I want" option given the possible problems with that kind of code and dynamic tracing. But it can be quite useful as first approach to it.

Sorry, I didn't mean to imply that you didn't know what you were doing.

Lighthouse can definitely be useful when studying execution in certain obfuscated binaries (I've used it for that several times myself). I just generally don't recommend it to anyone because it can take a trained eye to know when the 'coverage' doesn't seem to make sense.

I removed the set intersection with instructions and kinda worked as ugly hack and approach to this target. Faster than my Lighthouse clone that just paints directly from trace log (just ugly to parse that drcov header in C, can't even understand why that format ehhe).

I think that's a reasonable hack, I didn't want to recommend one myself because truthfully I wasn't sure if it would be sufficient. I've never been too happy with the coverage loading/mapping logic, it needs a better rewrite.

I went ahead and added 'more correct' support for your request with commit e3d636a which is available on the develop branch here on GitHub. As you can see, it's a bit more involved than meets the eye but I also did a small bit of refactoring. There may also be bugs, I did not test it robustly.

There should also be an 'Orphan Coverage' entry at the end of the Coverage Table that you can right click -> dump addresses for to get the explicit addresses of the coverage which does not fall within a function.