Bitfield support in decompiler

Question

Bitfield support in decompiler

nik0sc opened this issue 6 years ago · 23 comments

Is your feature request related to a problem? Please describe.
Right now the decompiler shows bitfield access simply as shift and mask (in other words, it is unaware of bitfields).

For example, consider:

a big-endian bitfield that is a byte long, and
a member 3 bits long starting at the 2nd bit.

A member read might look like bitfield >> 3 & 0x7, and a member write like bitfield = (bitfield & 0xc7) | (member << 3 & 0x38). This makes understanding decompiler output difficult.

The data type manager allows the declaration of bitfields only by importing them through the "Parse C Source" menu item (great if you have a header file for your platform), however the decompiler does not make use of this information.

Describe the solution you'd like

Ability to declare bitfields in the data type manager
Control over implementation-specific details like member allocation order
Decompiler recognizes data/variables typed as a bitfield + shift-and-mask pcode matching defined offsets and lengths as a bitfield member access, and shows the member access instead of the shift and mask

The above example would then look like var1 = bitfield.member and bitfield.member = var1 for the read and write cases.

Describe alternatives you've considered
No real alternative besides the current situation of consulting datasheets and my own notes for bitfield layout.

Bitfield layout will depend on architecture and endianness
There is no definitive way for a function to access a bitfield member. It could shift first then mask, or mask then shift. Recognizing member access, even by pcode, might not be trivial.

Additional context
This is mainly for embedded systems that pack many short parameters into registers.

Answer 1 · 2019-06-06T17:56:11.000Z

This is something I'd really like to see implemented, both in the decompiler and just in the disassembly list view. I feel like a lot of good additions could be done to the enumerations feature. In addition to this, the ability to specify values within bitmasks within the enum would be great. Systems that use their own flag registers may group multiple independent sets into a single register each with a different mask.

Separating enumerations from the overall "data types" in some way would make navigating them easier as well.

Answer 2 · 2019-06-10T21:22:11.000Z

Are you by any chance trying to decompile mips binaries? In recent ISAs (r2 and above) there are specific instructions for accessing fields which could be decompiled if you have the type straight as a C bitfield operation.

Answer 3 · 2019-06-11T09:28:38.000Z

@nihilus Don't know about mips but I'm working on a powerpc binary right now. Most bitfield access is done with the rlwinm and rlwimi instructions which make it very clear which range of a register is being read and written. But of course this doesn't translate into decompiled output.

Answer 4 · 2019-06-11T11:17:36.000Z

on x86 its a mess of shifting and masking

Answer 5 · 2019-06-12T14:46:25.000Z

I'm on ARM currently, and there are a ton of processor specific SFR's as well as flags within the user firmware that would drastically benefit from this

Answer 6 · 2019-07-18T22:17:46.000Z

The ability to represent bitfields within Structures has just been added to the master branch . Support for bitfields has been added to the CParser, PDB parser and DWARF. The PDB XML file format has changed for bitfields - any retained PDB XML files will need to be regenerated to benefit from the bitfield improvements (bitfield bit-offset information was missing from XML). Note that "aligned" bitfield packing support is currently to msb filled first for big-endian and lsb filled-first for little-endian data. These bitfield component definitions are currently not conveyed to the decompiler and there is currently no bitfield reference mechanism. Structure Data instances in memory will reflect bitfield data. See Structure Editor help content for some additional information.

Answer 7 · 2019-07-22T23:05:22.000Z

I am closing this ticket since no immediate action is required. We are investigating bitfield support for the decompiler.

Answer 8 · 2022-08-24T13:04:24.000Z

@ghidra1 What's the prognosis here??? We can currently define bitfields, but the decompiler support is still missing!!

Answer 9 · 2022-08-24T14:02:09.000Z

This is a feature request that has neither been implemented or rejected for future support. I will reopen it and put it through our triage and prioritization process.

Answer 10 · 2022-08-24T16:30:09.000Z

Support for bitfields in the decompiler is planned, but we have no timeline yet.

Answer 11 · 2022-08-25T17:21:20.000Z

Don't forget, a bitfield may span more the one register. Eg, in early x86 assembly a long (being 32-bits) has to use 2 16-bit registers! This is currently an issue as we end up treating the result as 2 16-bit values at present.

Answer 12 · 2022-08-25T17:45:37.000Z

@Wall-AF there are all kinds of conventions when you consider all processors/compilers and the resulting pcode for bitfield manipulations. Reversing this in the decompiler is what makes it so hard. It can also be ambiguous pcode.

Answer 13 · 2022-08-25T17:52:28.000Z

@Wall-AF there are all kinds of conventions when you consider all processors/compilers and the resulting pcode for bitfield manipulations. Reversing this in the decompiler is what makes it so hard. It can also be ambiguous pcode.

Understood. Just being hopeful! Maybe there could be a manual way to tell the decompiler to treat 2 registers as one longer register (at some future point).

Answer 14 · 2022-08-26T15:27:05.000Z

treat 2 registers as one longer register

This is a double-edge sword and is done only with adjacent registers in the language implementation. Doing this can encourage decompiler to always treat as single varnode even for cases where they should be separate.

Answer 15 · 2022-08-26T17:21:40.000Z

treat 2 registers as one longer register

Only by manual say so.

Answer 16 · 2022-08-26T18:31:44.000Z

treat 2 registers as one longer register

The reason behind this is twofold:

In 16-bit processors/compilers, 32-bit numeric values are (90+% of the time in my app) manipulated through two 16-bit registers using different register combinations (sometimes DX:AX or AX:DX or other combinations that may include BX and CX). (I'm sure this will be similar for 64-bit processors needing to represent 128- or 256-bit numbers.) In these cases, providing the stack variable or (pointed at) structure has the location/member defined as a 32-bit type, the register load occurs using the correct endianness of the single location using a +2 on the named variable/member for the high-word (in little-endian). This should enable the decompiler to understand the concept I believe.
There already exists similar functionality for defining custom calling conventions as demonstrated in the x86-16.cspec file that ensure 32-bit returns populate the DX:AX register combination.

Answer 17 · 2023-09-05T15:26:10.000Z

Is there a provisional workaround to get a prettier/more information rich decompilation for bitfields?

Answer 18 · 2024-02-12T14:04:41.000Z

I hope this is something that still being worked on

Answer 19 · 2024-04-21T17:18:29.000Z

What parts of the decompiler would need to be modified and/or what work would need to be done to support this?

Answer 20 · 2024-05-23T17:44:43.000Z

Is there a provisional workaround to get a prettier/more information rich decompilation for bitfields?

Something you can do is create an enum datatype for each bitfield value and then continuously add bitfield permutations to the enum as you come across them in the decompilation.

For example, you could start out with the following enum:

1 = READ
2 = WRITE
4 = EXECUTE

Then add the following after you come across certain permutations:

3 = READ_and_WRITE
5 = READ_and_EXECUTE
7 = READ_and_WRITE_and_EXECUTE

This is tedius, but it's way better than setting one time equates. At least you'll only have to define each permutation once.

Answer 21 · 2024-05-23T17:59:30.000Z

@spicydll I think I've seen recent versions of Ghidra automatically take care of the permutations, e.g. showing 3 as READ|WRITE, so you should only need to define the individual flags.

Answer 22 · 2024-06-19T06:05:49.000Z

@spicydll I think I've seen recent versions of Ghidra automatically take care of the permutations, e.g. showing 3 as READ|WRITE, so you should only need to define the individual flags.

This is useful, though it runs into limitations when you pit it again something like masking out bits. So something like foo = foo & ~(SOME|BITS) when end up being a bit & with every bit that's being kept ORed out. If automatic permuting could use ~ appropriately then that'd be a huge usability boon.

Answer 23 · 2024-09-08T15:33:03.000Z

Is there a provisional workaround to get a prettier/more information rich decompilation for bitfields?

Something you can do is create an enum datatype for each bitfield value and then continuously add bitfield permutations to the enum as you come across them in the decompilation.

For example, you could start out with the following enum:
1 = READ
2 = WRITE
4 = EXECUTE
Then add the following after you come across certain permutations:
3 = READ_and_WRITE
5 = READ_and_EXECUTE
7 = READ_and_WRITE_and_EXECUTE
This is tedius, but it's way better than setting one time equates. At least you'll only have to define each permutation once.

For small bitfields this solution works fine, but a lot of structs where was optimized for space, they're often accessed or packed as short/int, which would need enums of size 2^16 and 2^32 respectively, which isn't really an option.