/faultreiber

Generate a C parser for a structured file format.

Primary LanguagePythonMIT LicenseMIT

faultreiber

FOSSA Status

faultreiber generates a parser library in C for a structured (binary) file format. The input is an XML file that describes the format.
The C source code will be in the form of multiple source and header files.
The headers have header guards and are already extern "C"ed.

Demo

For a practical example, look at the example XML file under resources. The XML file describes the format of a WASM object file:
To run the demo, run run.sh, go to the test direcotory and run make test to run the executable.
To run valgrind --leak-check=yes run make valgrind.

Memory Leaks

Code generated by faultreiber should not leak any memory if everything went according to plan during code-gen. If that's not the case let me knkow.

How to Use

A function named read_aggr_{name} will be generated that takes an int _fd file descriptor for the file that it will read.{name} is what you pass to faultreiber with the --name option.
The return type will be a C structure with type {name}_lib_ret_t. The struct is defined as:

typedef struct {
  name_obj_t obj;
  void** void_train;
  uint64_t* current_void_size;
  uint64_t* current_void_count;
}name_lib_ret_t

{name}_obj_t is a C structure defined in aggregate.h that holds all the read modules.
A function named release_all_{name} will be generated in aggregate.c that releases almost all the memory.
The proper order of realeasing the memory in the client code will be like below assuming the return value of read_aggr_{name} is stored in lib_ret and --name was passed a value of wasm:

release_all_wasm(lib_ret->void_train, lib_ret->current_void_count);
free(lib_ret->obj);
free(lib_ret);

faultreiber XML file

The root node should have two childs, named exactly READ and DEFINITION(order not important).
The READ node will include the actual structures that the parser will read and can return.
The DEFINITION node includes the definitions for the structures that are aggregate.

Rules:

Any child node of either DEFINITION or READ will have to at least have the attributes name and type defined. The presence of the attribute count is optional but if it's not present faultreiber will assume that the count is one.
The presence of the attribute isaggregate signifies the fact that the data structure is composed of other smaller parts. faultreiber will only read the children of a node that is the child of either the DEFINITION or READ node(unless a child node has the attribute conditional set). If a data structure requires more children then you should add a new node under DEFINITION and reference that node from it's parent. In other words, an aggregate node can't itself have child nodes that are aggregate.

count, size, type and condition attributes can reference a child node of the DEFINITION node. To do that, you should use self::TAG.
the tag names of the nodes that are on the same level should be unique. The name attribute of the nodes on the same level need to be unique as well.
The order of the nodes that appear as children of the DEFINITION node, even when the child nodes are referencing each other, is unimportant to faultreiber.

Tags should follow the naming convention for naming XML nodes. The name attributes should follow the C identifier naming convention(if the value of the name attribute is invalid in C as as identifier you're going to end up with code that won't even build).
The following values are valid values for the type attribute:

  • int8
  • uint8
  • int16
  • uint16
  • int32
  • uint32
  • int64
  • uint64
  • int128
  • uint128
  • float
  • double
  • string
  • FT::conditional
  • self::TAG

For string nodes, the node should either have a non-empty size attribute or have a delimiter attribute. In case a delimiter attribute is selected the value of the delimiter should be provided as the value of the delimiter attribute to the node.
Strings read through a delimiter node will have their delimiter attached to the end of the string(null-terminated or otherwise). String reads that have a size attribute will be forcefully null-terminated even if the original string was not null-terminated.

Child nodes of READ node that have the unordered attribute set, will be regarded as such, meaning they can appear in the file sporaically. Such nodes will have to have a child node with attriute sign.The value of the sign attribute is used to check for the presence of the parent node in the file.
unorderedbegin and unorderedend attributes denote the begenning and end of an unordered section in the READ node. For every unordered section, only one node needs to define the begin and end attributes. All the other nodes, including the nodes that define the unorderedbegin and unorderedend attributes, shall have the unordered attribute defined.
Any child of the READ node that is not inside an unordered block or doesnt have the unordered attribute set, will be regarded as ordered.

Whether int128 or uint128 are defined depends on your the C implementation you are using on your host. If 128-bit integers are not supported or you need to read in bigger integers, you can simply use a smaller int type and increase the count attribute accordingly.
The FT::conditional tag for a type means that the actual content of the node will depend on a value. The attribute condition will provide what that condition is. The value for the condition should be provided as text for the different nodes that define what the actual contents should be.
size attribute is currently only meaningful when the type attribute is set as string in which case it denotes the size of the string.

Options

  -h, --help            show this help message and exit
  --targetname TARGETNAME
                        main target name
  --outdir OUTDIR       path to output dir
  --structs STRUCTS     the structs json file
  --structsinclude STRUCTSINCLUDE
                        the path to the header that's going to be included by
                        structs.h before structure declarations.
  --xml XML             paht to the xml file
  --dbg                 debug
  --datetime            print date and time in autogen files
  --inline              inlines reader funcs
  --static              statics reader funcs
  --verbose             verbose
  --forcenullterm       terminate all strings with null even if they are not
                        originally null-terminated
  --strbuffersize STRBUFFERSIZE
                        the size of the buffer for string reads
  --strbuffgrowfactor STRBUFFGROWFACTOR
                        the factor by which the strbuffer will grow
  --voidbuffersize VOIDBUFFERSIZE
                        the size of the buffer for void* buffer
  --voidbuffgrowfactor VOIDBUFFGROWFACTOR
                        the factor by which the voidbuffer will grow
  --singlefile          the generated code will be put in a single file
  --singlefilename SINGLEFILENAME
                        name of the single file
  --name                will be used in generating some code identifiers

limitations

Big-Endian reads are not supported.
None-byte-sized raw reads are not supported.

makefile

That would be on you but there is an example makefile in the test directory so you can take a look if you want.
You can also get generic ones from here. They're licensed under the Unlicense.

TODO

All the items under limitations.
Figure out what the license of the generated code is.

Projects

The list of the projects that use faulreiber:

License

faultreiber is provided under MIT. I'm assuming(I'm not a lawyer) the generated code is considered "derived work". If it is, then the generated code will also fall under MIT.

FOSSA Status