Bitfield support in decompiler
nik0sc opened this issue ยท 23 comments
Is your feature request related to a problem? Please describe.
Right now the decompiler shows bitfield access simply as shift and mask (in other words, it is unaware of bitfields).
For example, consider:
- a big-endian bitfield that is a byte long, and
- a member 3 bits long starting at the 2nd bit.
A member read might look like bitfield >> 3 & 0x7
, and a member write like bitfield = (bitfield & 0xc7) | (member << 3 & 0x38)
. This makes understanding decompiler output difficult.
The data type manager allows the declaration of bitfields only by importing them through the "Parse C Source" menu item (great if you have a header file for your platform), however the decompiler does not make use of this information.
Describe the solution you'd like
- Ability to declare bitfields in the data type manager
- Control over implementation-specific details like member allocation order
- Decompiler recognizes data/variables typed as a bitfield + shift-and-mask pcode matching defined offsets and lengths as a bitfield member access, and shows the member access instead of the shift and mask
The above example would then look like var1 = bitfield.member
and bitfield.member = var1
for the read and write cases.
Describe alternatives you've considered
No real alternative besides the current situation of consulting datasheets and my own notes for bitfield layout.
- Bitfield layout will depend on architecture and endianness
- There is no definitive way for a function to access a bitfield member. It could shift first then mask, or mask then shift. Recognizing member access, even by pcode, might not be trivial.
Additional context
This is mainly for embedded systems that pack many short parameters into registers.
This is something I'd really like to see implemented, both in the decompiler and just in the disassembly list view. I feel like a lot of good additions could be done to the enumerations feature. In addition to this, the ability to specify values within bitmasks within the enum would be great. Systems that use their own flag registers may group multiple independent sets into a single register each with a different mask.
Separating enumerations from the overall "data types" in some way would make navigating them easier as well.
Are you by any chance trying to decompile mips binaries? In recent ISAs (r2 and above) there are specific instructions for accessing fields which could be decompiled if you have the type straight as a C bitfield operation.
@nihilus Don't know about mips but I'm working on a powerpc binary right now. Most bitfield access is done with the rlwinm and rlwimi instructions which make it very clear which range of a register is being read and written. But of course this doesn't translate into decompiled output.
on x86 its a mess of shifting and masking
I'm on ARM currently, and there are a ton of processor specific SFR's as well as flags within the user firmware that would drastically benefit from this
The ability to represent bitfields within Structures has just been added to the master branch . Support for bitfields has been added to the CParser, PDB parser and DWARF. The PDB XML file format has changed for bitfields - any retained PDB XML files will need to be regenerated to benefit from the bitfield improvements (bitfield bit-offset information was missing from XML). Note that "aligned" bitfield packing support is currently to msb filled first for big-endian and lsb filled-first for little-endian data. These bitfield component definitions are currently not conveyed to the decompiler and there is currently no bitfield reference mechanism. Structure Data instances in memory will reflect bitfield data. See Structure Editor help content for some additional information.
I am closing this ticket since no immediate action is required. We are investigating bitfield support for the decompiler.
@ghidra1 What's the prognosis here??? We can currently define bitfields, but the decompiler support is still missing!!
This is a feature request that has neither been implemented or rejected for future support. I will reopen it and put it through our triage and prioritization process.
Support for bitfields in the decompiler is planned, but we have no timeline yet.
Don't forget, a bitfield may span more the one register. Eg, in early x86 assembly a long
(being 32-bits) has to use 2 16-bit registers! This is currently an issue as we end up treating the result as 2 16-bit values at present.
@Wall-AF there are all kinds of conventions when you consider all processors/compilers and the resulting pcode for bitfield manipulations. Reversing this in the decompiler is what makes it so hard. It can also be ambiguous pcode.
@Wall-AF there are all kinds of conventions when you consider all processors/compilers and the resulting pcode for bitfield manipulations. Reversing this in the decompiler is what makes it so hard. It can also be ambiguous pcode.
Understood. Just being hopeful! Maybe there could be a manual way to tell the decompiler to treat 2 registers as one longer register (at some future point).
treat 2 registers as one longer register
This is a double-edge sword and is done only with adjacent registers in the language implementation. Doing this can encourage decompiler to always treat as single varnode even for cases where they should be separate.
treat 2 registers as one longer register
Only by manual say so.
treat 2 registers as one longer register
The reason behind this is twofold:
- In 16-bit processors/compilers, 32-bit numeric values are (90+% of the time in my app) manipulated through two 16-bit registers using different register combinations (sometimes
DX:AX
orAX:DX
or other combinations that may includeBX
andCX
). (I'm sure this will be similar for 64-bit processors needing to represent 128- or 256-bit numbers.) In these cases, providing the stack variable or (pointed at) structure has the location/member defined as a 32-bit type, the register load occurs using the correct endianness of the single location using a+2
on the named variable/member for the high-word (in little-endian). This should enable the decompiler to understand the concept I believe. - There already exists similar functionality for defining custom calling conventions as demonstrated in the
x86-16.cspec
file that ensure 32-bit returns populate theDX:AX
register combination.
Is there a provisional workaround to get a prettier/more information rich decompilation for bitfields?
I hope this is something that still being worked on
What parts of the decompiler would need to be modified and/or what work would need to be done to support this?
Is there a provisional workaround to get a prettier/more information rich decompilation for bitfields?
Something you can do is create an enum datatype for each bitfield value and then continuously add bitfield permutations to the enum as you come across them in the decompilation.
For example, you could start out with the following enum:
1 = READ
2 = WRITE
4 = EXECUTE
Then add the following after you come across certain permutations:
3 = READ_and_WRITE
5 = READ_and_EXECUTE
7 = READ_and_WRITE_and_EXECUTE
This is tedius, but it's way better than setting one time equates. At least you'll only have to define each permutation once.
@spicydll I think I've seen recent versions of Ghidra automatically take care of the permutations, e.g. showing 3 as READ|WRITE
, so you should only need to define the individual flags.
@spicydll I think I've seen recent versions of Ghidra automatically take care of the permutations, e.g. showing 3 as
READ|WRITE
, so you should only need to define the individual flags.
This is useful, though it runs into limitations when you pit it again something like masking out bits. So something like foo = foo & ~(SOME|BITS)
when end up being a bit &
with every bit that's being kept ORed out. If automatic permuting could use ~
appropriately then that'd be a huge usability boon.
Is there a provisional workaround to get a prettier/more information rich decompilation for bitfields?
Something you can do is create an enum datatype for each bitfield value and then continuously add bitfield permutations to the enum as you come across them in the decompilation.
For example, you could start out with the following enum:
1 = READ 2 = WRITE 4 = EXECUTE
Then add the following after you come across certain permutations:
3 = READ_and_WRITE 5 = READ_and_EXECUTE 7 = READ_and_WRITE_and_EXECUTE
This is tedius, but it's way better than setting one time equates. At least you'll only have to define each permutation once.
For small bitfields this solution works fine, but a lot of structs where was optimized for space, they're often accessed or packed as short/int, which would need enums of size 2^16 and 2^32 respectively, which isn't really an option.