Support uImage format and/or manual arch specification
skochinsky opened this issue · 3 comments
First, congrats on the awesome tool.
I decided to try it out and went to the OpenWRT release archive. The first one alphabetically was ARC and it failed:
File "C:\Work\git\vmlinux-to-elf\vmlinux_to_elf\architecture_detecter.py", line 157, in guess_architecture
raise ValueError('The architecture could not be guessed successfully')
ValueError: The architecture could not be guessed successfully
In fact, the uImage header already includes the architecture, load address and even entrypoint:
openwrt-18.06.4-arc770-generic-uImage: u-boot legacy uImage, ARC OpenWrt Linux-4.9.184, Linux/DesignWare ARC, OS Kernel Image (Not compressed), 4522192 bytes, Thu Jun 27 12:18:52 2019, Load Address: 0x80000000, Entry Point: 0x8000A000, Header CRC: 0xA11EF4A4, Data CRC: 0xAC4BE39B
Additionally, there is no need to know the architecture if not writing out the ELF file (e.g. when just dumping symbols), so this step could be skipped until required. You could also let user specify it manually or just write 0 to e_machine.
Note: uImage format may employ its own compression (seen at least gzip used).
Hello,
Kudos for your work on IDA too.
I can see multiple things that I could improve from your post:
- Supporting parsing the uImage header. Indeed, this could be useful for the architecture, but furthermore to correct the offset corresponding to the base address of the kernel: currently, it is considered that it is either the start of the raw input file, the start of the original ELF section contents, or the start of the compressed stream. However, I would note that I have rarely seen uncompressed uImage kernels in the wild.
- I should add prologue detection for ARC (because why not).
- Supporting customizing the output
e_machine
field of the ELF header, based on the command line arguments tovmlinux-to-elf
. I could think also to set the detected architecture to a dummy value for thekallsyms-finder
utility that provides a text representation of the symbols, however, it is not totally correct that it would be without consequence: in recent kernels, the size of all fields exceptkallsyms_addresses
,kallsyms_offsets
,kallsyms_relative_base
has been trimmed to 4 bytes (the size of a GNU Assembler.long
, even in x64) for optimization purposes, and as the addresses/offset fields lay at the edge of thekallsyms
table, it is complicated to guess their size (except through pattern frequency matching or something): thus I rely in part on the know addressing bit size of the detected architecture. I guess that I could default to 32 bits in absence of detected architecture or extra flags (or require an explicit bit size).
In the end, it is possible that the best would be to add generic flags for information that are not 100 % sure to be inferred exactly by the tool (--kernel-offset
, --base-address
, --e-machine
, --bit-size
), even though the detection works well with my corpus of kernels.
I should get back at this soon. Other ideas are welcome.
Regards,
Hello,
For your information, your kernel now reconstructs well without extra arguments. Also, I have added support for the extra arguments that I have mentioned in the previous message. These have been documented in the README.md
.
Regards,
Thanks!
FYI found an example of a compressed uImage which seeems to be not handled out-of-box: openwrt-18.06.4-lantiq-falcon-lantiq_easy98000-nand-squashfs-sysupgrade.bin
Also openwrt-18.06.4-ramips-rt305x-3g-6200n-initramfs-kernel.bin
However no symbols found even after manual decompression :( Making an ELF with just code section may be useful although without .bss the analysis will not be too great...