BinaryAnalysisPlatform/bap

ARM Big-endian targets are not supported

Closed this issue · 5 comments

Could you please help me to troubleshoot using bap to derive bil output from an armv1 binary.

I was trying to analyse some big endian armv1 files last week (which I don't have at hand), and with those I couldn't get a sensible assembly output which matched what I was seeing in ghidra. For example, if I run this command:

bap mc --show-insn=asm --show-bil --arch=armeb -- 00 b0 a0 e3
mov r11, #0
{
  R11 := 0
}

then you get the same assembler interpretation as ghidra shows, so all is good. In its disassembly listing, ghidra shows this order e3 a0 b0 00, and when I use that order in bap mc I get this:

bap mc --show-insn=asm --show-bil --arch=armeb -- e3 a0 b0 00
adcseq r10, r0, r3, ror #1
{
  #1 := R3
  #2 := R0
  #3 := #1 >> 1 | #1 << 0x1F
  R10 := if ZF then R0 + (#1 >> 1 | #1 << 0x1F) + pad:32[CF] else R10
  CF := if ZF then high:1[pad:33[#2] + pad:33[#3] + pad:33[CF]] else CF
  VF := if ZF then high:1[~(#2 ^ #3) & (#2 ^ R10)] else VF
  NF := if ZF then high:1[R10] else NF
  ZF := if ZF then R10 = 0 else ZF
}

the assembler and bil are quite different. I cannot get the first mov r11, #0 output on the files I was analysing - I thought that by switching between --arch=arm and --arch=armeb that the endianess of each 32-bit word would switch about and then I'd get the correct mov r11, #0 assembly, but that didn't happen - the bil output is incomplete - sometimes empty, and callgraphs are not showing any child functions at all. I can improve the bil by physically changing the byte order within the input executables, but then objdump won't parse the file as it cannot make sense of the binary's offsets, and so bap ./file -recipe=dumps.scm won't complete.

To experiment at home, I dug up echo from a uclib repository, since it is big endian and armv1 - and ran bap on it to get various outputs, like asm and bil, but I'm still having problems, but this time unable to generate neither bil nor asm output. I figure that I must be doing some trivial thing incorrectly - which you might be able to point out to me. Is armv1 too old for current bap?

I've attached the echo file here, and if you have the time, could you please have a shot at getting bil output from it? I've tried using bap 2.2.0 and I also bap 2.3.0-alpha+97fb7fa. I'm using llvm-10 on ubuntu 18.04. I don't have ida.

To process echo I'm running bap ./echo --recipe=dumps.scm with this dumps.scm as below, and (option raw-* ) values for the attached file derived using bap specification ./echo:

(option optimization-level 0)
(option dump bil:echo.bil)
(option dump asm:echo.asm)

(option no-cache)
;;(option loader llvm)
;;(option llvm-base 32768)    
(option loader raw)
(option raw-arch armeb)
(option raw-base 32768)
(option raw-entry-point 35528)
(option raw-length 0x2b78)

;;(option target bap:armv7+eb)   

(option report-progress)

Thank you for your time.

echo.zip

ivg commented

I am looking into this but so far it looks like that the culprit is llvm, basically, without any bap involed we have the same with llvm,
The plain GNU objdump gives us (correct)

$ objdump -d echo | grep 9f98
9f98:	e3a00040 	mov	r0, #64	; 0x40

where llvm-objdump gives us a decoding of a reversed instruction

llvm-objdump -d echo | grep 9f98
9f98: e3 a0 00 40  	andmi	r10, r0, r3, ror #1

Yes, the order of the displayed bytes is correct, but the disassembly actually corresponds to the 0x4000a0e3 word.

ivg commented

Here is the reference to the corresponding bug in LLVM. And here is the fix. It looks like it is just ignored and abandoned :(

To summarize, LLVM doesn't support big-endian ARM targets. Possible solutions:

  1. try to workaround it by reversing bytes before they come to LLVM disassembler;
  2. switch to some other disassembler;
  3. wait for LLVM to fix it;
  4. accept the fact and at least disable llvm for big-endian ARM targets.

I am currently investigating the options. Will keep you in touch. If you have any ideas, I will be happy to hear them :)

Thank you, Ivan, for your most complete response. It's a shame, and a surprise, that LLVM ignores big-endian, especially since that fix in 2018. As for ideas - I did wonder if point (1) in your reply might be easy to set up - reverse the bytes before they go into LLVM. Or (2) - find another disassembler. Thanks again.

ivg commented

I did wonder if point (1) in your reply might be easy to set up - reverse the bytes before they go into LLVM.

It is doable but it such a klutch that it makes me feel uneasy :)

Or (2) - find another disassembler.

Yes, we can at least try capstone. It is a bit of work, but it will give us armeb.

So far I am trying to push the fix into the LLVM trunk. Posted it on twitter got some attention from the original author, but still no attention from the llvm community.

With that said, option (1) is still an option. And while I am reluctant to introduce such a klutch and would prefer to get it fixed upstream I can implement it, if you need it right now and don't want to wait. (In the best case scenario, we will have have to wait for months before it got into LLVM's trunk and for at least 6 months before it reaches the release).

With that said, option (1) is still an option. And while I am reluctant to introduce such a klutch and would prefer to get it fixed upstream I can implement it, if you need it right now and don't want to wait.

I appreciate your offer, and would happily use any quick-fix that you can come up with. If you don't wish to add such a fix to the source repository at this time, then I'd be happy to drop any source files which you edit into my local bap repository, where I can build them locally. Just attach them here and I'll download them.