m4b/bingrep

Disassemble support

bjorn3 opened this issue · 10 comments

For example capstone could be used.

m4b commented

@bjorn3 If you want to sketch out general api for what the cli would be for disassembly, I’d be interested.

Maybe just a few brief examples of proposed cli, along with expected output ?

LLVM disassembler is also an option here, instead of (or as an alternative to) capstone.

That requires an LLVM installation when building and running.

That only requires LLVM shared libraries, not a full Clang toolchain. It's basically the same requirement as for libcapstone.

Capstone-rs builds libcapstone.a as static library itself without having to install anything. This only takes like a minute. The LLVM shared libraries take much longer to build (just cloning llvm-project can take a minute depending on your internet connection), are way bigger and if you don't want to build it yourself you have to dynamically link it which adds a runtime dependency on LLVM unlike with capstone-rs.

FYI, disassembling code correctly requires support for relocating code sections (e.g., ELF's .text section), before starting the disassembly. Relocation is a considerable amount of work.

Objdump doesn't relocate. Instead it provides an option to show relocation entries after the instruction that used them.

But that makes the disassembly way less useful and sometimes even confusing, especially when compared to the disassembly of the debugger.

It only makes it a bit less useful IMHO. It is nice to have relocation support builtin, but as you said this is a considerable amount of effort. What you see in a debugger won't work for bingrep. In a debugger you see the disassembly relocated for the specific location that this instance of the program loads it at. Bingrep however would need to work with symbolic locations such that the disassembly is correct no matter where the object file or executable is loaded. I don't think many disassemblers support this.

Providing a useful ELF disassembly also requires parsing the procedure linkage table (PLT), in order to give an idea about which function is called by many calls/jumps. PLTs are platform-specific, and parsing them requires some poking and assumptions about code sequences generated by common compilers and linkers.

For this reason, for instance, the LLVM implementation of objdump only parses PLTs for AMD64, x86, and AArch64.