Reassembly Errors
miksh opened this issue · 21 comments
I am trying to use ramblr for my research but I found some errors when I ran it.
I created a simple patch(re-compile) program as follows.
(I did not add any instrumentations for test)
from patcherex.backends.reassembler_backend import ReassemblerBackend
import argparse
if __name__=='__main__':
parser = argparse.ArgumentParser();
parser.add_argument("input")
parser.add_argument("output")
args= parser.parse_args()
backend = ReassemblerBackend(args.input, debugging=False)
backend.save(args.output)
Also, I use a lastest angr versions as follows.
$ pip3 list| grep angr
angr 9.1.11508
$ pip list | grep angr
angr 7.8.9.26
However, I met several errors even when I ran it
Error #1
I created a toy program (hello.c) as follows.
$ cat hello.c
#include <stdio.h>
int main()
{
printf("hello world\n");
return 0;
}
$ gcc hello.c -no-pie -fno-pie -o hello
$ strip hello
Then I ran it, but my program got syntax error.
$ python3 ramblr/test/demo.py hello64 hello64_2
Deprecation warning: Use self.model.nodes() instead of nodes
Traceback (most recent call last):
File "ramblr/test/demo.py", line 11, in <module>
backend.save(args.output)
File "/test/ramblr/patcherex/patcherex/backends/reassembler_backend.py", line 145, in save
raise CompilationError("File: %s Error: %s" % (tmp_file_path,res))
patcherex.errors.CompilationError: File: /tmp/hello6428_s8lf9.s Error: (b'', b"/tmp/hello6428_s8lf9.s: Assembler messages:\n/tmp/hello6428_s8lf9.s: Warning: end of file not at end of a line; newline inserted\n/tmp/hello6428_s8lf9.s:9: Error: too many memory references for `sub'\n/tmp/hello6428_s8lf9.s:11: Error: too many memory references for `mov'\n/tmp/hello6428_s8lf9.s:13: Error: too many memory references for `test'\n/tmp/hello6428_s8lf9.s:20: Error: too many memory references for `add'\n/tmp/hello6428_s8lf9.s:40: Error: junk `ptr [rax+rax]' after expression\n/tmp/hello6428_s8lf9.s:58: Error: junk `ptr cs:[rax+rax]' after expression\n/tmp/hello6428_s8lf9.s:60: Error: junk `ptr [rax]' after expression\n/tmp/hello6428_s8lf9.s:71: Error: too many memory references for `mov'\n/tmp/hello6428_s8lf9.s:73: Error: too many memory references for `cmp'\n/tmp/hello6428_s8lf9.s:75: Error: too many memory references for `mov'\n/tmp/hello6428_s8lf9.s:79: Error: too many memory references for `mov'\n/tmp/hello6428_s8lf9.s:81: Error: too many memory references for `test'\n/tmp/hello6428_s8lf9.s:87: Error: too many memory references for `mov'\n/tmp/hello6428_s8lf9.s:91: Error: junk `ptr [rax+rax]' after expression\n/tmp/hello6428_s8lf9.s:105: Error: junk `ptr [rax+rax]' after expression\n/tmp/hello6428_s8lf9.s:108: Error: too many memory references for `mov'\n/tmp/hello6428_s8lf9.s:112: Error: too many memory references for `sub'\n/tmp/hello6428_s8lf9.s:114: Error: too many memory references for `mov'\n/tmp/hello6428_s8lf9.s:116: Error: too many memory references for `sar'\n/tmp/hello6428_s8lf9.s:118: Error: too many memory references for `mov'\n/tmp/hello6428_s8lf9.s:120: Error: too many memory references for `shr'\n/tmp/hello6428_s8lf9.s:122: Error: too many memory references for `add'\n/tmp/hello6428_s8lf9.s:124: Error: too many memory references for `sar'\n/tmp/hello6428_s8lf9.s:128: Error: too many memory references for `mov'\n/tmp/hello6428_s8lf9.s:130: Error: too many memory references for `test'\n/tmp/hello6428_s8lf9.s:136: Error: too many memory references for `mov'\n/tmp/hello6428_s8lf9.s:147: Error: junk `ptr [rax]' after expression\n/tmp/hello6428_s8lf9.s:161: Error: junk `ptr [rax+rax]' after expression\n/tmp/hello6428_s8lf9.s:170: Error: junk `ptr [rip+label_3]' after expression\n/tmp/hello6428_s8lf9.s:170: Error: too many memory references for `cmp'\n/tmp/hello6428_s8lf9.s:183: Error: too many memory references for `mov'\n/tmp/hello6428_s8lf9.s:187: Error: junk `ptr [rip+label_3]' after expression\n/tmp/hello6428_s8lf9.s:187: Error: too many memory references for `mov'\n/tmp/hello6428_s8lf9.s:200: Error: junk `ptr [rax+rax]' after expression\n/tmp/hello6428_s8lf9.s:212: Error: junk `ptr [rax]' after expression\n/tmp/hello6428_s8lf9.s:214: Error: junk `ptr cs:[rax+rax]' after expression\n/tmp/hello6428_s8lf9.s:225: Error: too many memory references for `mov'\n/tmp/hello6428_s8lf9.s:240: Error: too many memory references for `mov'\n/tmp/hello6428_s8lf9.s:242: Error: too many memory references for `mov'\n/tmp/hello6428_s8lf9.s:255: Error: junk `ptr cs:[rax+rax]' after expression\n/tmp/hello6428_s8lf9.s:257: Error: junk `ptr [rax]' after expression\n/tmp/hello6428_s8lf9.s:270: Error: too many memory references for `mov'\n/tmp/hello6428_s8lf9.s:276: Error: too many memory references for `lea'\n/tmp/hello6428_s8lf9.s:280: Error: too many memory references for `lea'\n/tmp/hello6428_s8lf9.s:284: Error: too many memory references for `mov'\n/tmp/hello6428_s8lf9.s:286: Error: too many memory references for `mov'\n/tmp/hello6428_s8lf9.s:288: Error: too many memory references for `sub'\n/tmp/hello6428_s8lf9.s:290: Error: too many memory references for `sub'\n/tmp/hello6428_s8lf9.s:292: Error: too many memory references for `sar'\n/tmp/hello6428_s8lf9.s:296: Error: too many memory references for `test'\n/tmp/hello6428_s8lf9.s:300: Error: too many memory references for `xor'\n/tmp/hello6428_s8lf9.s:302: Error: junk `ptr [rax+rax]' after expression\n/tmp/hello6428_s8lf9.s:305: Error: too many memory references for `mov'\n/tmp/hello6428_s8lf9.s:307: Error: too many memory references for `mov'\n/tmp/hello6428_s8lf9.s:309: Error: too many memory references for `mov'\n/tmp/hello6428_s8lf9.s:311: Error: junk `ptr [r12+rbx*8]' after expression\n/tmp/hello6428_s8lf9.s:313: Error: too many memory references for `add'\n/tmp/hello6428_s8lf9.s:315: Error: too many memory references for `cmp'\n/tmp/hello6428_s8lf9.s:320: Error: too many memory references for `add'\n/tmp/hello6428_s8lf9.s:345: Error: junk `ptr cs:[rax+rax]' after expression\n/tmp/hello6428_s8lf9.s:363: Error: too many memory references for `sub'\n/tmp/hello6428_s8lf9.s:365: Error: too many memory references for `add'\n")
After debugging, I found that there is a mistake as follows.
#src: ~/.local/lib/python3.6/site-packages/angr/analyses/reassembler.py: 2109
2070 def assembly(self, comments=False, symbolized=True):
...
2109 s = "\n".join(all_assembly_lines)
I fixed it as follows.
2109 s += "\n".join(all_assembly_lines)
After fixing an aformentioned error, I also got error (error #1) as follows.
$ python3 ramblr/test/demo.py hello64 hello64_2
Deprecation warning: Use self.model.nodes() instead of nodes
Traceback (most recent call last):
File "ramblr/test/demo.py", line 11, in <module>
backend.save(args.output)
File "/data2/tools/sok_script/ramblr/patcherex/patcherex/backends/reassembler_backend.py", line 145, in save
raise CompilationError("File: %s Error: %s" % (tmp_file_path,res))
patcherex.errors.CompilationError: File: /tmp/hello64fxruy6ow.s Error: (b'', b"/tmp/hello64fxruy6ow.s: Assembler messages:\n/tmp/hello64fxruy6ow.s: Warning: end of file not at end of a line; newline inserted\n/tmp/ccLbEVvb.o: In function `init':\n(.text+0x166): undefined reference to `label_9'\n/tmp/ccLbEVvb.o: In function `sub_400390':\n(.init+0x7): undefined reference to `label_0'\ncollect2: error: ld returned 1 exit status\n")
We examined assembly file that ramblr emited, and found that the errors are related to missing symbols.
$ gcc /tmp/hello64fxruy6ow.s -no-pie -fno-pie
/tmp/hello64fxruy6ow.s: Assembler messages:
/tmp/hello64fxruy6ow.s: Warning: end of file not at end of a line; newline inserted
/tmp/cc2KjYEb.o: In function `init':
(.text+0x166): undefined reference to `label_9'
/tmp/cc2KjYEb.o: In function `sub_400390':
(.init+0x7): undefined reference to `label_0'
collect2: error: ld returned 1 exit status
Error #2
Next, I test(recompile) 'ls' binary, and I found a different error as follows.
First, I ran python3 version.
python3 ramblr/test/demo.py coreutils-8.30_x64_nopie_ls ls.s
Deprecation warning: Use self.model.nodes() instead of nodes
Traceback (most recent call last):
File "/test/ramblr/patcherex/patcherex/backends/reassembler_backend.py", line 115, in save
assembly = self._binary.assembly(comments=True, symbolized=True) # type: str
File "/home/test/.local/lib/python3.6/site-packages/angr/analyses/reassembler.py", line 2087, in assembly
addr_and_assembly.extend(proc.assembly(comments=comments, symbolized=symbolized))
File "/home/test/.local/lib/python3.6/site-packages/angr/analyses/reassembler.py", line 1097, in assembly
s = b.assembly(comments=comments, symbolized=symbolized)
File "/home/test/.local/lib/python3.6/site-packages/angr/analyses/reassembler.py", line 912, in assembly
s = "\n".join([ins.assembly(comments=comments, symbolized=symbolized) for ins in self.instructions])
File "/home/test/.local/lib/python3.6/site-packages/angr/analyses/reassembler.py", line 912, in <listcomp>
s = "\n".join([ins.assembly(comments=comments, symbolized=symbolized) for ins in self.instructions])
File "/home/test/.local/lib/python3.6/site-packages/angr/analyses/reassembler.py", line 805, in assembly
op_asm = op.assembly()
File "/home/test/.local/lib/python3.6/site-packages/angr/analyses/reassembler.py", line 546, in assembly
raise BinaryError('Unsupported memory operand size for operand "%s"' % self.operand_str)
angr.analyses.reassembler.BinaryError: Unsupported memory operand size for operand "xword ptr [rip + 0xf217]"
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "ramblr/test/demo.py", line 11, in <module>
backend.save(args.output)
File "/test/ramblr/patcherex/patcherex/backends/reassembler_backend.py", line 119, in save
str(ex)
patcherex.errors.ReassemblerError: Reassembler failed to reassemble the binary. Here is the exception we caught: Unsupported memory operand size for operand "xword ptr [rip + 0xf217]"
I think ramblr could not properly handle following instruction.
objdump -d -M intel /data2/benchmark/coreutils-8.30/x64/gcc/nopie/o0-bfd/stripbin/ls | grep 0xf217
4100b3: db 2d 17 f2 00 00 fld TBYTE PTR [rip+0xf217] # 41f2d0 <_fini@@Base+0x5698>
4100c3: db 2d 17 f2 00 00 fld TBYTE PTR [rip+0xf217] # 41f2e0 <_fini@@Base+0x56a8>
Error #3
Also, python2 version emits different error as follows.
I think the error reason is similar to that of error #1
python2 ramblr/test/demo.py coreutils-8.30_x64_nopie_ls ls.s
WARNING | 2022-01-21 19:55:39,425 | angr.analyses.disassembly_utils | Your version of capstone does not support MIPS instruction groups.
WARNING | 2022-01-21 19:55:42,556 | angr.engines.successors | Exit state has over 256 possible solutions. Likely unconstrained; skipping. <BV64 global_c000001_11_64{UNINITIALIZED}>
WARNING | 2022-01-21 19:55:42,625 | angr.engines.successors | Exit state has over 256 possible solutions. Likely unconstrained; skipping. <BV64 global_c000009_19_64{UNINITIALIZED}>
WARNING | 2022-01-21 19:55:42,767 | angr.engines.successors | Exit state has over 256 possible solutions. Likely unconstrained; skipping. <BV64 global_c00001a_28_64{UNINITIALIZED}>
WARNING | 2022-01-21 19:55:42,837 | angr.engines.successors | Exit state has over 256 possible solutions. Likely unconstrained; skipping. <BV64 global_c00002b_37_64{UNINITIALIZED}>
WARNING | 2022-01-21 19:55:42,892 | angr.engines.successors | Exit state has over 256 possible solutions. Likely unconstrained; skipping. <BV64 global_c000033_44_64{UNINITIALIZED}>
WARNING | 2022-01-21 19:55:42,948 | angr.engines.successors | Exit state has over 256 possible solutions. Likely unconstrained; skipping. <BV64 global_c00003b_51_64{UNINITIALIZED}>
WARNING | 2022-01-21 19:55:43,015 | angr.engines.successors | Exit state has over 256 possible solutions. Likely unconstrained; skipping. <BV64 global_c000043_58_64{UNINITIALIZED}>
WARNING | 2022-01-21 19:55:43,081 | angr.engines.successors | Exit state has over 256 possible solutions. Likely unconstrained; skipping. <BV64 global_c000054_67_64{UNINITIALIZED}>
WARNING | 2022-01-21 19:55:43,143 | angr.engines.successors | Exit state has over 256 possible solutions. Likely unconstrained; skipping. <BV64 global_c00005c_74_64{UNINITIALIZED}>
WARNING | 2022-01-21 19:55:43,209 | angr.engines.successors | Exit state has over 256 possible solutions. Likely unconstrained; skipping. <BV64 global_c000065_84_64{UNINITIALIZED}>
WARNING | 2022-01-21 19:55:43,306 | angr.engines.successors | Exit state has over 256 possible solutions. Likely unconstrained; skipping. <BV64 global_c000076_96_64{UNINITIALIZED}>
WARNING | 2022-01-21 19:55:43,361 | angr.engines.successors | Exit state has over 256 possible solutions. Likely unconstrained; skipping. <BV64 global_c00007e_104_64{UNINITIALIZED}>
WARNING | 2022-01-21 19:55:43,891 | angr.engines.successors | Exit state has over 256 possible solutions. Likely unconstrained; skipping. <BV64 global_c000087_117_64{UNINITIALIZED}>
WARNING | 2022-01-21 19:55:43,928 | angr.engines.successors | Exit state has over 256 possible solutions. Likely unconstrained; skipping. <BV64 global_c00008f_121_64{UNINITIALIZED}>
WARNING | 2022-01-21 19:55:43,979 | angr.engines.successors | Exit state has over 256 possible solutions. Likely unconstrained; skipping. <BV64 global_c000098_127_64{UNINITIALIZED}>
WARNING | 2022-01-21 19:55:44,066 | angr.engines.successors | Exit state has over 256 possible solutions. Likely unconstrained; skipping. <BV64 global_c0000a1_143_64{UNINITIALIZED}>
WARNING | 2022-01-21 19:55:44,102 | angr.engines.successors | Exit state has over 256 possible solutions. Likely unconstrained; skipping. <BV64 global_c0000a9_147_64{UNINITIALIZED}>
WARNING | 2022-01-21 19:55:44,158 | angr.engines.successors | Exit state has over 256 possible solutions. Likely unconstrained; skipping. <BV64 global_c0000b2_153_64{UNINITIALIZED}>
WARNING | 2022-01-21 19:55:44,294 | angr.engines.successors | Exit state has over 256 possible solutions. Likely unconstrained; skipping. <BV64 global_c0000bc_164_64{UNINITIALIZED}>
ERROR | 2022-01-21 19:55:44,477 | angr.analyses.cfg.cfg_fast | Decoding error occurred at basic block address 0x402a4c of function 0x402a4c.
Traceback (most recent call last):
File "ramblr/test/demo.py", line 11, in <module>
backend.save(args.output)
File "/test/ramblr/patcherex/patcherex/backends/reassembler_backend.py", line 145, in save
raise CompilationError("File: %s Error: %s" % (tmp_file_path,res))
patcherex.errors.CompilationError: File: /tmp/lssl1_qt.s Error: ('', "/tmp/lssl1_qt.s: Assembler messages:\n/tmp/lssl1_qt.s:30871: Error: junk `ptr [rbp+0x10]' after expression\n/tmp/lssl1_qt.s:30873: Error: junk `ptr [word ptr [rip+label_1563]]' after expression\n/tmp/lssl1_qt.s:30881: Error: junk `ptr [word ptr [rip+label_1558]]' after expression\n/tmp/lssl1_qt.s:30883: Error: junk `ptr [rbp+0x10]' after expression\n/tmp/lssl1_qt.s:30891: Error: junk `ptr [rbp+0x10]' after expression\n/tmp/lssl1_qt.s:30910: Error: junk `ptr [rbp+0x10]' after expression\n/tmp/lssl1_qt.s:30912: Error: junk `ptr [word ptr [rip+label_1558]]' after expression\n/tmp/lssl1_qt.s:30949: Error: junk `ptr [word ptr [rip+label_1568]]' after expression\n/tmp/lssl1_qt.s:30954: Error: junk `ptr [rbp+0x10]' after expression\n/tmp/lssl1_qt.s:30960: Error: junk `ptr [rbp+0x10]' after expression\n/tmp/lssl1_qt.s:30996: Error: junk `ptr [word ptr [rip+label_1568]]' after expression\n/tmp/lssl1_qt.s:31001: Error: junk `ptr [rbp+0x10]' after expression\n/tmp/lssl1_qt.s:31004: Error: junk `ptr [rbp+0x10]' after expression\n/tmp/lssl1_qt.s:31465: Error: junk `ptr [word ptr [rip+label_1568]]' after expression\n/tmp/lssl1_qt.s:31470: Error: junk `ptr [rbp - 0x20]' after expression\n/tmp/lssl1_qt.s:31478: Error: junk `ptr [word ptr [rip+label_1568]]' after expression\n/tmp/lssl1_qt.s:31489: Error: junk `ptr [word ptr [rip+label_1568]]' after expression\n/tmp/lssl1_qt.s:31494: Error: junk `ptr [rbp - 0x20]' after expression\n/tmp/lssl1_qt.s:31500: Error: junk `ptr [rbp - 0x10]' after expression\n/tmp/lssl1_qt.s:31526: Error: junk `ptr [rsp]' after expression\n/tmp/lssl1_qt.s:31553: Error: junk `ptr [rbp - 0x30]' after expression\n/tmp/lssl1_qt.s:31564: Error: junk `ptr [rbp - 0x30]' after expression\n/tmp/lssl1_qt.s:31568: Error: junk `ptr [rbp - 0x30]' after expression\n/tmp/lssl1_qt.s:31578: Error: junk `ptr [rbp - 0x30]' after expression\n/tmp/lssl1_qt.s:31582: Error: junk `ptr [rbp - 0x10]' after expression\n/tmp/lssl1_qt.s:31597: Error: junk `ptr [rbp - 0x10]' after expression\n/tmp/lssl1_qt.s:31599: Error: junk `ptr [rbp - 0x30]' after expression\n/tmp/lssl1_qt.s:31603: Error: junk `ptr [rbp - 0x10]' after expression\n/tmp/lssl1_qt.s:31621: Error: junk `ptr [rsp]' after expression\n/tmp/lssl1_qt.s:31690: Error: junk `ptr [rbp - 0x10]' after expression\n/tmp/lssl1_qt.s:31692: Error: junk `ptr [word ptr [rip+label_1618]]' after expression\n/tmp/lssl1_qt.s:31700: Error: junk `ptr [rsp]' after expression\n/tmp/lssl1_qt.s:31708: Error: junk `ptr [word ptr [rip+label_1618]]' after expression\n/tmp/lssl1_qt.s:31716: Error: junk `ptr [rsp]' after expression\n/tmp/lssl1_qt.s:63025: Warning: end of file not at end of a line; newline inserted\n")
I feel this is either caused by recent changes in angr’s memory data analysis or by the disassembler engine (capstone). I’ll debug it today and see what is going on.
Can you run test cases here and make sure they run in your environment?
Somehow I have a feeling that you are using GCC to assemble Intel syntax assembly files that reassembler generates… Give nasm a try?
It seems that these issues are caused by newer versions of GCC (?) changing the names of init
and fini
sections. Tiny changes that are causing troubles in reassembler's GCC library function removal logic. angr/angr#3099 should solve this problem.
Also do not try the Python 2 version of angr. It is no longer maintained.
By the way, here is a better test.py that I use (so that you can get AT&T syntax that GCC likes):
import argparse
import subprocess
import angr
from patcherex.backends.reassembler_backend import ReassemblerBackend
if __name__=='__main__':
parser = argparse.ArgumentParser();
parser.add_argument("input")
parser.add_argument("output")
args= parser.parse_args()
p = angr.Project(args.input, auto_load_libs=False)
r = p.analyses.Reassembler(syntax="at&t")
r.symbolize()
r.remove_unnecessary_stuff()
assembly = r.assembly(comments=True, symbolized=True)
with open(args.output + ".s", "w") as f:
f.write(assembly)
subprocess.check_call(["gcc", "-no-pie", args.output + ".s", "-o", args.output],
stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
Thank you for your prompt response.
I can solve Error#2 because of your help. I didn't know 'at&t' option, thanks.
I hope another issue will be resolved soon.
Also, I have another question.
Dose Ramblr officially support x86/x86-64 PIE binaries?
I have another issue when I test a toy program which was compiled with PIE option.
The PR is merged. Error 1 and 2 should have both been solved.
Dose Ramblr officially support x86/x86-64 PIE binaries?
Nope. In theory it's extremely easy to do since you no longer need to heuristically symbolize pointers on PIE binaries. I am not interested in implementing the support for angr's reassembler. You can do it by yourself (and send us a PR) if you want to :)
By the way, an example solution that leverages explicit pointers (or relocation information) in PIE binaries is RetroWrite. I bet you know that paper.
I appreciate your effort.
Fortunately, all the above errors seems to be resolved. :)
I got another error when I reassembled x86 binaries.
To be specific, ramblr emits reassembly files but I failed to recompile all of them.
And I found that Ramblr emits duplicated symbols as follows.
# data @ 0x806b394
.label_1:
.section .tm_clone_table
.align 4
# data @ 0x806b394
.label_1:
.section .bss
I think ramblr mis-handles some sections.
I added relevant section info.
readily -S ls_x86 | grep '\.data' -A1
[24] .data PROGBITS 0806b1e0 0221e0 0001b4 00 WA 0 0 32
[25] .tm_clone_table PROGBITS 0806b394 022394 000000 00 WA 0 0 4
I’ve never seen .tm_clone_table. Can you share the binary?
Sorry for the late reply. I'm taking a sick leave and will take a look at the binary when situation permits.
I'm sorry to hear that. I wish you get well soon.
@miksh Did you notice that your ls_x86.run
has GCC-specific PIE code, specifically getpc calls (0x8049cc0)? Your binary is not full PIE, but partial PIE. I feel it's because your GCC is too new (which probably always links against PIE libraries regardless of your compilation settings), or the build process has some issues.
Reassembler does not officially support PIE binaries. I'm working on a quick fix, but I am not interested in testing it on a large corpus of PIE binaries.
angr/angr#3171 works on your ls_x86.run
.
Two other issues that I fixed in this PR: Reassembler did not support xword ptr
. It also did not support empty sections (like tm_clone_table
).
Thank you for your kind cooperation. I have just checked that ramblr properly reassembles the sample binary. 👍
@miksh Did you notice that your
ls_x86.run
has GCC-specific PIE code, specifically getpc calls (0x8049cc0)? Your binary is not full PIE, but partial PIE. I feel it's because your GCC is too new (which probably always links against PIE libraries regardless of your compilation settings), or the build process has some issues.Reassembler does not officially support PIE binaries. I'm working on a quick fix, but I am not interested in testing it on a large corpus of PIE binaries.
I compiled the sample binary with gcc v7.5.0. I found that some intrinsic functions use getpc call even if I use no-pie options.
I found that some intrinsic functions use getpc call even if I use no-pie options.
I believe it's because some libraries on your system that GCC statically linked to has getpc
calls inside.
I'm closing this issue. Feel free to reopen or open a new one if you have more questions about reassembler!