/ghidra_kernelcache

a Ghidra framework for iOS kernelcache reverse engineering

Primary LanguagePythonApache License 2.0Apache-2.0

ghidra_kernelcache: a Ghidra iOS kernelcache framework for reverse engineering

This framework is the end product of my experience in Kernelcache reverse engineering , I usually look for vulnerabilities by manually auditing the kernel and its extensions and have automated most of the things that I really wanted to see in Ghidra to speed up the process of reversing, and this has proven to be effective and saves a lot of time. The framework works on iOS 12/13/14/15 and on macOS 11/12 (both kernelcache and single KEXT) and has been made to the public with the intention to help people to start researching on iOS kernel without the struggle of preparing their own environment. As I believe, this framework (including the toolset it provides and with some basic knowledge of IOKit) is sufficient to start hacking into the Kernelcache.

The framework is entirely written in Python, and can be extended to build other tools, it provides some basic APIs which you can use in almost any project and save time from reading the verbose manual, you're welcome to read the core functionalities in utils/ directory.

Ghidra is good when it comes to analyzing Kernelcaches, but like other RE tools, it requires some manual work, ghidra_kernelcache provides a good entry point to fix things up at the start and even while doing reverse engineering, hence providing a good-looking decompiler output.

There is a similar project made by @_bazad in IDAPro called ida_kernelcache which provides a good entry point for researchers wanting to work with the kernel image in IDA, my framework looks a bit similar to Brandon's work, and goes beyond by providing much more features to make the process of working with the kernelcache less painful.

Features :

  • *OS kernelcache symbolication.
  • C++ class hierarchy reconstruction and virtual tables.
  • Virtual method call references.
  • Auto fixing external method's dispatch table for both ::externalMethod() and ::getTargetAndMethodForIndex().
  • Applying namespaces to class methods.
  • Symbol name and type propagation over a function arguments.
  • Applying function signatures for known kernel functions.
  • Import old structures and classes from old project to a new project.

These features are made as a separated tools which can be executed either by key shortcuts or by clicking on their icons in the toolbar.

Installation

Clone the repository :

git clone https://github.com/0x36/ghidra_kernelcache.git

Important note: The project has been tested on Ghidra 10.1_PUBLIC and 10.2_DEV and it is not backward compatible.

Go to Windows → Script Manager, click on script Directory , then add ghidra_kernelcache to the directory path list. Go to Windows → Script Manager, in scripts listing, go to iOS→kernel category and check the plugins seen there, they will appear in GHIDRA Toolbar .

in logos/ directory, you can put you own logos for each tool.

iOS kernelcache symbolication

ghidra_kernelcache requires at the first stage iometa (made by @s1guza), a powerful tool providing C++ class information in the kernel binary, the great thing is that it works as a standalone binary, so the output can be imported to your favorite RE framework by just parsing it. My framework takes iometa's output and parses it to symbolicate and fix virtual tables.

Usage

After decompressing the kernel, run the following commands :

$ iometa -n -A /tmp/kernel A10-legacy.txt > /tmp/kernel.txt
# if you want also to symbolicate using jtool2
$ jtool2 --analyze /tmp/kernel

Load the kernelcache in Ghidra, DO NOT USE BATCH import , load it as Mach-O image. After the Kernelcache being loaded and auto-analyzed, click on the icon shown in the toolbar or just press Meta-Shift-K, then put the full path of iometa output which is /tmp/kernel.txt in our case.

if you want to use jtool2 symbols, you can use jsymbol.py located iOS→kernel category as well.

iOS kernelcache API

Full API examples are in ghidra_kernelcache/kc.py

→ Here are some examples of manipulating class objects :

from utils.helpers import *
from utils.class import *
from utils.iometa import ParseIOMeta

ff = "/Users/mg/ghidra_ios/kernel.txt"
iom = ParseIOMeta(ff)
Obj = iom.getObjects()
kc = kernelCache(Obj)


# symbolicate the kernel 
kc.process_all_classes()

# symbolicate the classes under com.apple.iokit.IOSurface bundle
kc.process_classes_for_bundle("com.apple.iokit.IOSurface")

# symbolicate the classes under __kernel__ bundle
kc.process_classes_for_bundle("__kernel__")

# Process one class (including its parents)
kc.process_class("IOGraphicsAccelerator2")

# Clears the content of the class structures (vtables are excluded)
kc.clear_class_structures()

# Overwrite the old vtable structure definition and resymbolicate it again
kc.update_classes_vtable()

# Reconstructing function call trees by enumerating all pac references and find their corresponding virtual method call
kc.explore_pac()

As you can see, you can fully or partially symbolicate the kernelcache, if partial symbolication was chosen, ghidra_kernelcache will automatically construct all class dependencies before proceeding. If you run the script against the whole kernelcache (full symbolication), ghidra_kernelcache will take several minutes to analyze the kernel image.

One finished, Ghidra will provide the following :

→ A new category has been added in Bookmark Filter called "iOS":

image1

→ IOKit class virtual tables are added to 'iOS' Bookmark for better and faster virtual table lookup, you can just look for a kext or a class by providing letters, words or kext bundle in the search bar.

image2

→ Fixing the virtual table : disassembles/compiles unknown code, fixes namespaces, re-symbolicates the class methods and applies function definition to each method:

image3

→ Creation of class namespaces and put each method to its own corresponding namespace:

image4

→ Creating class structure with the respect of class hierarchy :

image5

→ Creating class vtables, and each method has its own method definition for better decompilation output:

image6

Full implementation can be found in utils/class.py.

Here are some screenshots of before/after symbolicating using ghidra_kernelcache :

image7

image8

macOS Kext symbolication


ghidra_kernelcache macOS support is for both kernelcache and single KEXT symbolication for ARM64e and x86_64 architectures. IMPORTANT: At the time of writing Ghidra is unable to parse the whole macOS kernelcache, but it's possible to load it in IDA for initial analysis then import the database (idb to xml) to Ghidra later, but this is out of scope. If you manage to do so, ghidra_kernelcache will take care of the rest.

There are some few steps which have to be taken before symbolicating any macOS Kernel Extension, as ghidra_kernelcache's main aim is to reconstruct the class hierarchy and to manage all class structures into a single database, a kernel extension does not fulfill those requirements, which means that the symbolication of a single KEXT does require a symbolication of the kernel and perhaps other Kernel Extensions that depends on, hence some extra work needs to be done here. ghidra_kernelcache now provides a powerful way to symbolicate Kernel Extensions including the kernel by managing and sharing class structures and virtual method definitions via Ghidra's powerful DataType Project Archive

Steps to symbolicate a Kernel Extension

  • Create a new folder in your Ghidra project, then load /System/Library/Kernels/kernel.release.XXXXX into that folder and let Ghidra analyzes it.
  • Create a new Project Archive : Go to DataType Provider → Click on the arrow in the top right of the window → New Project Archive → Place it inside the newly created Folder → Name it to something (i.e macOS_12.1).
  • Now symbolicate the kernel using ghidra_kernelcache, the process is quite similar to iOS kernelcache symbolication.
$ iometa -n -A /System/Library/Kernels/kernel.release.t8101 > /tmp/kernel.txt
  • In python console, or you can find the full script implementation in KM.py script :
>>> from utils.helpers import *
>>> from utils.kext import *
>>> iom = ParseIOMeta("/tmp/kernel.txt")
>>> Obj = iom.getObjects()
>>> kc = Kext(Obj,shared_p="macOS_12.1")
>>> kc.process_kernel_kext()
  • Once finished, a database association has been created between kernel's archive and macOS_12.1 archive. Now, right click on the kernel.release.t8001Commit DataTypes TomacOS_12.1.
  • Then Right ClickSelect AllCommit.
  • Save the project archive : Right clickSave Archive. We've just created a project archive which can be shared across all Kernel Extensions. Let's take an example of IOSurface Kext for Apple Silicon :
$ lipo /System/Library/Extensions/IOSurface.kext/Contents/MacOS/IOSurface -thin arm64e -output /tmp/iosurface.arm64e
$ iometa -n -A /tmp/iosurface.arm64e > /tmp/iosurface.txt
  • Load the Kext into the same folder path where the kernel database and the project archive are located and let Ghidra finishes the analysis
  • Load the project archive we've created earlier macOS_12.1 : go to Data Type ManagerOpen Project Archive , then select the macOS_12.1
  • Run the following methods, full script can be found in KM.py:
from utils.helpers import *
from utils.kext import *

kc = Kext(Obj,shared_p="macOS_12.1")

# This method fixes LC_DYLD_CHAINED_FIXUPS for M1 Kernel extension
kc.depac()

# This method reconstructs class hierarchy and builds virtual table for each class
kc.process_kernel_kext()

Important Note : sometimes kc.process_kernel_kext() fails because Ghidra was not able to demangle some C++ symbols. To fix this, go to the script manager and run DemangleAllScript.java script then restart kc.process_kernel_kext() again.

Custom classes

There are some cases where some C++ classes where ghidra_kernelcache and iometa cannot symbolicate, so a new feature has been added to handle this. Custom() class reconstruction iterates through all the ::vtable symbols and checks wether the class is already defined or not, if not, it automatically creates a class structure, function definitions for each identified class method, a namespace and a virtual table for each class.

Custom class creation is supported on macOS only at the moment.

$ iometa -n -A /System/Library/Kernels/kernel.release.t8101 > /tmp/kernel.txt
$ iometa -n -A <kext_path> >> /tmp/kernel.txt
from utils.helpers import *
from utils.custom_kc import *

if __name__ == "__main__":
    default = "/tmp/kernel.txt"
    ff = askString("iometa symbol file","Symbol file: ",default)
    iom = ParseIOMeta(ff)
    Obj = iom.getObjects()

    kc = Custom(Obj)

    kc.process_all_classes()
    kc.explore_pac()

Miscellaneous scripts


Importing KDK's Dwarf4

Ghidra somehow fails to load the corresponding .dsym directory, I made a small script to fix this. It can be found here. Usage : load the kernel from your KDK path, let Ghidra finishes the analysis, then run dwarf_fix.py, it will load the symbols and the process may take several minutes. image12

Resolving virtual method calls references

ghidra_kernelcache provides two ways to resolve virtual calls by kernelCache.explore_pac or fix_extra_refs

kernelCache.explore_pac() If you're working on arm64e binary, ghidra_kernelcache can recognize virtual method calls by looking for the Pointer Authentication Code value. The process is straightforward and unlike fix_extra_refs(), kernelCache.explore_pac does not rely on the Pcode or varnode identification, it just iterates through all the instructions in the program, searches for MOVK instructions, fetches the second operand and looks for its corresponding value in the database. Usage : Create a KernelCache instance via kernelCache , Kext or Custom, then call explore_pac() method.

from utils.helpers import *
from utils.kext import *

if __name__ == "__main__":
    default = "/tmp/kernel.txt"
    ff = askString("iometa symbol file","Symbol file: ",default)
    iom = ParseIOMeta(ff)
    Obj = iom.getObjects()

    kc = Kext(Obj)

   	 kc.explore_pac()

fix_extra_refs() This function is based on a basic data flow analysis to find all virtual call methods and to resolve their implementations automatically, it works across all architectures and has the ability to recognize the source data type from the decompiler output and resolve all the virtual call references inside the function, so the user is able to jump forward/backward directly to/from the implementation without manually looking for it. The most useful feature provided by fix_extra_refs is that it keeps the references synchronized on each execution. For example, you changed a variable data type to a class data type, fix_extra_refs will automatically recognize the change, and will recursively go through all call sites to resolve their references, and it will stop only when the call site queue became empty.

There are some other features provided by fix_extra_refs such as:

  • It auto-detects _ptmf2ptf() calls and resolves their call method for both offsets and full function address
  • It identifies the namespace of an unresolved function name (functions which start with FUN_ ), and resolve it by putting the target function into its own namespace (e.g adding this pointer of the corresponding class).

You can find the implementation in utils/references.py, fix_extra_refs parses the pcode operations and looks for CALLIND and CALL opcodes, then gets all involved varnodes in the operation, once a Varnode definition was identified, it retrieves its HighVariable to identify the class object type, if the type is unknown (i.e does not appear to be a class structure) it just ignores it, otherwise it will take the class name, looks up its virtual call table, using the offset provided by the Varnode, it can fetch the right virtual method call and puts a reference on the call instruction.

fix_extra_refs(toAddr(address))

Here is an output example of using fix_extra_refs :

image9

Note that it has successfully resolved IOService::isOpen(), OSArray:getNextIndexOfObject() and IOStream::removeBuffer() virtual calls without any manual modification.

Next, fix_extra_refs will decompile IOStream::removeBuffer(), gets all HighVariables of this method, then it resolves their references like the previous method ... and so on. image10

Auto fixing external method tables

I believe that every researcher has some script to deal with this part, as it is the main attack surface of IOKit, doing it manually is a burden, and it must be automated in a way the researcher wants to dig into multiple external method tables. There are two scripts provided by theghidra_kernelcache : fix_methodForIndex.py and fix_extMethod.py. You can enable them like the other scripts as shown above.

Usage: Put the cursor at the start of the external dispatch table, run the script: provide the target, and the number of selectors. Example for IOStreamUserClient::getTargetAndMethodForIndex() :

image11

namespace.py : fix method namespaces …

This is a useful script to populate the class type to all encountered methods, and it's a dependency for extra_refs.py script in order to recursively explore the callee functions and to resolve their references.

Usage: Put the cursor in the decompiler output of the wanted function, run the script from the toolbar or press Meta-Shift-N .

Symbol name and type propagation

ghidra_kernelcache provides type propagation support for basic Pcode operations, but it will likely fails for some variables that using complex casting . If someone wants to help, or wants to start working with low level stuff in Ghidra, this is the opportunity to do so. Implementation can be found in ghidra_kernelcache/propagate.py

Loading function signatures

Parsing C++ header files in Ghidra is not possible, and having kernel function signatures in kernelcache can improve many things in the decompiler output. For example, let's say we have added virtual IOMemoryMap * map(IOOptionBits options = 0 );, Ghidra will automatically re-type the return value into IOMemoryMap pointer automatically for both function definition and function signatures. You can add any C++ symbol into signatures/ directory with the respect of the syntax and you can find defined function signatures in this directory.

// Defining an instance class method
IOMemoryDescriptor * withPersistentMemoryDescriptor(IOMemoryDescriptor *originalMD);

// Defining a virtual method, it must start with "virtual" keyword
virtual IOMemoryMap * createMappingInTask(task_t intoTask,  mach_vm_address_t atAddress,  IOOptionBits options,  mach_vm_size_t offset = 0,  mach_vm_size_t length = 0);

// Defining a structure
struct task_t;

// typedef'ing a type
typedef typedef uint IOOptionBits; 

// Lines begining with '//' are ignored 

Usage: After symbolicating the kernel, it is highly recommended to run the script load_sigatnures.py to load all available function signatures. As most of the previous tools, run this script by adding it in the toolbar or from the Plugin manager or just press Meta-Shift-S .

Loading old structures:

This script is straight-froward, it imports all structures, classes, typedefs, and function definitions and everything with SourceType.USER_DEFINED from an old project to a new one.

Usage: Open the old and the new Ghidra projects on the same tool, go to load_structs.py script, put the old program name to src_prog_string variable, and the new one to dst_prog_string variable, then run the script.

Contribute

If you see the project interesting and want to contribute, just do a PR and I will review it, meanwhile, I would like to see some contribution in the following areas:

Credit

I would like to thank @s1guza for his awesome iometa which ghidra_kernelcache depends on.