/cgen-llvm-ir-generator

CGEN LLVM-IR is a generator of binary-to-LLVM-IR translators. Just provide the CPU architecture.

Primary LanguageC++MIT LicenseMIT

CGEN LLVM IR generator

CGEN LLVM IR generator is an extension of the CGEN framework that attempts to generate C++ translators that emit LLVM IR code semantically equivalent to a binary file in input.

Ideally, once an RTL CPU Architecture description is provided to CGEN LLVM IR generator, a C++ program is generated, accepting a stream containing a binary program compiled for the same architecture. The binary input is disassembled and the code is translated to a semantically equivalent LLVM IR program.

Roadmap

Here's a brief list of tasks to be accomplished for a working prototype of the generator.

  • CPU registers allocation
    • Global variables allocation
    • Test correctness towards .cpu files
  • Disassambler
    • Read an instruction word from a byte stream
    • Decode instruction opcode
    • Decode instruction fields into in-memory objects
    • Provide dump() facilities
    • Test against available .cpu files
  • Semantic translator

Hands-on

To run CGEN LLVM IR generator, a convenient Python script is provided to hide the odds and quirks of Scheme and its implementation in Guile.

Prerequisites

We assume you have a working Guile 1.8 environment set up on your machine, with the guile executable exported in your system PATH. A guide to compile and install it is available here.

Also you are required to have LLVM 3.8.0 (+ development headers) installed (CMake must be able to find LLVM CMake Find script). Optionally, you will need clang-format installed to perform code formatting on generated C++ source files.

Running CGEN-IR:

$ ./cgen-ir.py --help
usage: cgen-ir.py [-h] -a ARCH -m MACHINE [-i ISA] [-t DEC_H] [-d DEC_CPP]
                  [-r REG_H]
                  dstPath

A generator of LLVM-IR generators. Yes.

positional arguments:
  dstPath               Destination path

optional arguments:
  -h, --help            show this help message and exit
  -a ARCH, --arch ARCH  .cpu description file
  -m MACHINE, --machine MACHINE
                        Variant of the architecture
  -i ISA, --isa ISA     ISA name of the architecture
  -t DEC_H, --decoder-header DEC_H
                        Decoder header filename
  -d DEC_CPP, --decoder-src DEC_CPP
                        Decoder source filename
  -r REG_H, --registers-header REG_H
                        Registers allocation source filename

You are required to provide at least:

-a ARCH     .cpu description file path
-m MACHINE    the machine you want to generate translators for (e.g.: arch700)
destPath    destination directory where to generate sources in

If you want to manually specify the name of generated sources you can use -t, -d, -r arguments.

Care: If your target .cpu file has multiple ISAs defined, you must provide a -i argument declaring which one you want to generate a translator for.

Compile generated translator

cgen-ir.py script generates source files along with a non-necessarily-working driver (i.e. main.cpp) and a CMakeLists.txt file. You can easily compile the generated translator with the usual process in CMake

$ cd dstPath
$ mkdir build
$ cd build
$ cmake ..
$ make

Examples

An example on CGEN LLVM IR generator usage is available for ARC700 architecture here.