/binparsergen

Binary Parser Generator for Go

Primary LanguageGoApache License 2.0Apache-2.0

Binary Parser Generator

This is a code generator for Go that creates binary parsers. Parsers are loosely based on the Rekall VTypes language and this parser specifically aims to support the json files used and generated by Rekall.

Overview

This parser generator creates objects which represent a C struct. The layout of the struct is typically stored as a JSON file specifying the offset of each field, the type of each field (can be another struct).

The C struct does not need to be complete - i.e. we do not need to every field defined. Each field is parsed independently at its specified offset.

The JSON files can be obtained automatically from debugging symbols (i.e. MS PDB files) and therefore we can create Go parsers for data structures automatically from debugging symbols.

How do I use it?

The first step is to create a vtypes json definition file. This can be obtained from the Rekall project or you can write one by hand.

Here is an example:

{
    "_GUID": [16, {
        "Data1": [0, ["unsigned long", {}]],
        "Data2": [4, ["unsigned short", {}]],
        "Data3": [6, ["unsigned short", {}]],
        "Data4": [8, ["Array", {
            "count": 8,
            "target": "unsigned char"
        }]]
    }]
}

The JSON structure is as follows:

  1. The file is an object with keys being the struct name and values being the struct definition.

  2. The definition is a list with the first item being the size of the struct

  3. The second item is an object with the key being a field name and the value being a field definition.

  4. The field definition is a list with the first item being a struct offset and the second being a type definition

  5. The type definition is a list with the first item being a type name and the second being a parameters object.

  6. Depending on the specific type the mapping object may contain different parameters to control the object.

Note that we generally generate vtype files automatically from debugging symbols and these contain way too much information - for example for structs or fields we dont care about. In order to prevent the binary generator from creating a huge amount of useless code we need to specifically tell it which structs to generate and maybe even filter out some fields.

The spec is just a yaml file:

{
    "Module": "main",
    "Profile": "RegistryProfile",
    "Filename": "profile_vtypes.json",
    "Structs": ["_HBASE_BLOCK", "_GUID", "_LARGE_INTEGER",
                "_HBIN", "_HCELL", "_CM_KEY_NODE", "_CM_KEY_INDEX",
                "_CHILD_LIST", "_HHIVE", "_CM_KEY_VALUE",
                "_CM_KEY_INDEX_FAST", "_CM_KEY_INDEX_FAST_ELEMENT",
                "_CM_BIG_DATA"
               ],
    "FieldBlackList": {
        "_LARGE_INTEGER": ["u"]
    }
}

The spec specifies:

  1. Module: The Go module that will be generated (package name)
  2. Profile: The name of the profile class which will be generated.
  3. Filename: The path to the vtype json file.
  4. Structs: A list of structs to generate parsers for. All these structs will belong to the one profile.
  5. FieldBlackList: A mapping between struct name and fields that will be ignored.

Now we can geneate the code:

$ binparsergen myspecfile.yaml > mygenerated_code.go