ARM/Thumb disassembly is wrong
japaric opened this issue · 5 comments
Objdump
$ arm-none-eabi-objdump -Cd target/thumbv7m-none-eabi/debug/app
target/thumbv7m-none-eabi/debug/app: file format elf32-littlearm
Disassembly of section .text:
00000000 <_reset-0x8>:
0: 20010000 .word 0x20010000
4: 00000009 .word 0x00000009
00000008 <_reset>:
8: b083 sub sp, #12
a: e7ff b.n c <_reset+0x4>
c: 202a movs r0, #42 ; 0x2a
e: 9001 str r0, [sp, #4]
10: 9002 str r0, [sp, #8]
12: e7ff b.n 14 <_reset+0xc>
14: e7fe b.n 14 <_reset+0xc>
cargo-sym
$ cargo sym -Cd target/thumbv7m-none-eabi/debug/app
Disassembly of section .text
0000000000000009 _reset:
10009: b0 ff e7 2a bhs #0xffa0fed1
1000d: 20 01 90 02 addseq r0, r0, #8
10011: 90 ff e7 fe mcr2 p15, #7, pc, c7, c0, #4
There's some funny things going on here:
0000000000000009
has too many zeroes for a 32-bit hexadecimal. ARMv7-M is a 32-bit architecture.0x9
is the "THUMB address" of_reset
(the bit 0 is set to 1) but the disassembly should start at0x8
. It seems thatcargo-sym
is starting to disassemble at address0xA
(see the value of the instructions: for objdump is 83 b0 ff e7 2a ..., for cargo-sym is b0 ff e7 2a ...- The disassembly should show THUMB instructions and those are 16-bit instructions. cargo-sym is interpreting the values as THUMB-2 instructions (32-bit instructions)
- On the cargo sym output, there's the address
10009
right below_reset
. The address seem to be off by0x10000
.
I will post the binary used in this report in a bit.
fix
fixed some formatting issues, in addition to checking whether the thumb bit is set and disassembling in thumb mode if so.
now generates:
target/debug/cargo-sym sym -Cd 01-qemu.thumbv7m-none-eabi
Disassembly of section .text
00000008 <_reset>
8: b083 sub sp, #0xc
a: e7ff b #0xc
c: 202a movs r0, #0x2a
e: 9001 str r0, [sp, #4]
10: 9002 str r0, [sp, #8]
12: e7ff b #0x14
14: e7fe b #0x14
and:
target/debug/cargo-sym sym -Cd 04-led.thumbv7em-none-eabihf
Disassembly of section .text
08000008 <_EXCEPTIONS>
8000008: 080000ed stmdaeq r0, {r0, r2, r3, r5, r6, r7}
800000c: 080000ed stmdaeq r0, {r0, r2, r3, r5, r6, r7}
8000010: 080000ed stmdaeq r0, {r0, r2, r3, r5, r6, r7}
8000014: 080000ed stmdaeq r0, {r0, r2, r3, r5, r6, r7}
8000018: 080000ed stmdaeq r0, {r0, r2, r3, r5, r6, r7}
800001c: 00000000 andeq r0, r0, r0
8000020: 00000000 andeq r0, r0, r0
8000024: 00000000 andeq r0, r0, r0
8000028: 00000000 andeq r0, r0, r0
800002c: 080000ed stmdaeq r0, {r0, r2, r3, r5, r6, r7}
8000030: 00000000 andeq r0, r0, r0
8000034: 00000000 andeq r0, r0, r0
8000038: 080000ed stmdaeq r0, {r0, r2, r3, r5, r6, r7}
800003c: 080000ed stmdaeq r0, {r0, r2, r3, r5, r6, r7}
08000040 <_reset>
8000040: b580 push {r7, lr}
8000042: 466f mov r7, sp
8000044: b082 sub sp, #8
8000046: e7ff b #0x8000048
8000048: f80cf000 bl #0x8000064
800004c: e7ff b #0x800004e
800004e: f817f000 bl #0x8000080
8000052: e7ff b #0x8000054
8000054: f82bf000 bl #0x80000ae
8000058: e7ff b #0x800005a
800005a: f837f000 bl #0x80000cc
800005e: e7ff b #0x8000060
8000060: e7ff b #0x8000062
8000062: e7fe b #0x8000062
08000064 <app::power_on_gpioe::h78af44f23a22f67e>
8000064: b082 sub sp, #8
8000066: e7ff b #0x8000068
8000068: e7ff b #0x800006a
800006a: 0014f241 movw r0, #0x1014
800006e: 0002f2c4 movt r0, #0x4002
8000072: 9000 str r0, [sp]
8000074: 6801 ldr r1, [r0]
8000076: 1100f441 orr r1, r1, #0x200000
800007a: 6001 str r1, [r0]
800007c: b002 add sp, #8
800007e: 4770 bx lr
08000080 <app::put_pe9_in_output_mode::h6338568d6b3f648b>
8000080: b083 sub sp, #0xc
8000082: e7ff b #0x8000084
8000084: e7ff b #0x8000086
8000086: 0000f241 movw r0, #0x1000
800008a: 0000f6c4 movt r0, #0x4800
800008e: 9002 str r0, [sp, #8]
8000090: 6800 ldr r0, [r0]
8000092: 9001 str r0, [sp, #4]
8000094: e7ff b #0x8000096
8000096: 9801 ldr r0, [sp, #4]
8000098: 2140f420 bic r1, r0, #0xc0000
800009c: 9100 str r1, [sp]
800009e: e7ff b #0x80000a0
80000a0: 9802 ldr r0, [sp, #8]
80000a2: 9900 ldr r1, [sp]
80000a4: 2280f441 orr r2, r1, #0x40000
80000a8: 6002 str r2, [r0]
80000aa: b003 add sp, #0xc
80000ac: 4770 bx lr
080000ae <app::set_pe9_high::h14fcedcfc4b06dbb>
80000ae: b082 sub sp, #8
80000b0: e7ff b #0x80000b2
80000b2: e7ff b #0x80000b4
80000b4: 0018f241 movw r0, #0x1018
80000b8: 0000f6c4 movt r0, #0x4800
80000bc: 9000 str r0, [sp]
80000be: e7ff b #0x80000c0
80000c0: 9800 ldr r0, [sp]
80000c2: 7100f44f mov.w r1, #0x200
80000c6: 6001 str r1, [r0]
80000c8: b002 add sp, #8
80000ca: 4770 bx lr
080000cc <app::set_pe9_low::h5d9c159fa5571658>
80000cc: b082 sub sp, #8
80000ce: e7ff b #0x80000d0
80000d0: e7ff b #0x80000d2
80000d2: 0018f241 movw r0, #0x1018
80000d6: 0000f6c4 movt r0, #0x4800
80000da: 9000 str r0, [sp]
80000dc: e7ff b #0x80000de
80000de: e7ff b #0x80000e0
80000e0: 9800 ldr r0, [sp]
80000e2: 7100f04f mov.w r1, #0x2000000
80000e6: 6001 str r1, [r0]
80000e8: b002 add sp, #8
80000ea: 4770 bx lr
080000ec <app::exception::handler::hac6f2ae6b7dd2702>
80000ec: b083 sub sp, #0xc
80000ee: e7ff b #0x80000f0
80000f0: be00 bkpt #0
80000f2: e7ff b #0x80000f4
80000f4: e7fe b #0x80000f4
I also added a --dump
flag to output the debug format of the binary file it read, which will be nice(r) for bug reports.
Thanks for the detailed bug report(s)!
discussion
The thumb bit flag blew my mind. For anyone reading this, the problem was that I was disassembling using the offset given by the symbol's st_value
field. In the above example, this is (correctly/incorrectly) 9 (i.e., if you inspect the raw st_value
given by the ELF binary, it is 9, not 8). But because all arm assembly instructions are even, the odd bit in an instruction address was able to be repurposed to signify to the processor (or disassembler in our case) that the instruction is a thumb instruction, and not a regular arm32 instruction. At least, that's the gist of what I understood.
Consequently, one must essentially check whether the address is odd, and if so, switch to thumb disassembly mode, and subtract -1 from both the offset and virtual memory address to correctly disassemble at the right location and to get the correct instruction display. crazy! :)
misc
Do you like the <>
around symbol names? I copied objdump because i could, but I don't know if i like it.
Also it currently incorrectly displays (i think) 4-byte instructions like:
80000e2: 7100f04f mov.w r1, #0x2000000
which should be rendered:
80000e2: f04f 7100 mov.w r1, #0x2000000
for whatever reason?
because all arm assembly instructions are even
To be pedantic: pointers to code are actually 4-byte aligned. (See Section 4.1 of the AAPCS). And, yes, the bit 0 is used to indicate "thumb mode" (that is the subroutine contains thumb instruction) when it's set to 1.
This output:
08000008 <_EXCEPTIONS>
8000008: 080000ed stmdaeq r0, {r0, r2, r3, r5, r6, r7}
800000c: 080000ed stmdaeq r0, {r0, r2, r3, r5, r6, r7}
cargo-sym shouldn't show instructions in this case because this _EXCEPTIONS
symbol is actually a static
variable (an array of function pointers: [fn(); 14]
) and was originally in the .rodata
section (I used a linker script to move it to .text
) so it's "data" not "code". I don't know how ELF represent this ... the section is marked as not executable in the ELF, maybe?
Do you like the <> around symbol names?
Actually, sometimes I use it as an "anchor" when I'm viewing the disassembly with less
. I search for >:
and it takes me to the next symbol.
To be pedantic: pointers to code are actually 4-byte aligned. (See Section 4.1 of the AAPCS).
To be pedantic back, the pointers are all still even :P
Yea the _EXCEPTIONS
is easily fixed; those symbols are usually tagged <LOCAL|GLOBAL> OBJECT
. So in the printing routine if it's not tagged as a function, i'll print it as data. (e.g., offset: 4|8 byte chunks of 4 ). Unfortunately if they're not tagged as OBJECT, no way I can know they're data or code without more heavy weight analysis. (blame Von Neumann for this awful state of affairs in binary program analysis ;))
And I'll keep the >
then.
@japaric this is fixed in latest git version. will publish a crate version asap, need to publish goblin and fixup the capstone-rs situation, since the PR isn't being merged which it requires :/ may need to publish another crate.
But anyway, the arm printer for objects should be working:
target : "04-led.thumbv7em-none-eabihf"
Disassembly of section .text
08000008 <_EXCEPTIONS>:
8000008: 080000ed 080000ed 00000000 00000000 ...í...í........
8000018: 080000ed 00000000 00000000 00000000 ...í............
8000028: 00000000 080000ed 080000ed 080000ed .......í...í...í
8000038: 080000ed 080000ed ...í...í
i'm sure it has some minor bugs with edge cases. but that code was horrible and i don't feel like messing with it. someone else can work on it if they're bored and feel like writing columb-based printer code :P i'm sure they'll do a much better job than me
There will be other bugs, but with the new target api, I think you can be lazier than ever!
Let me know how it goes :)