/amacc

Small C Compiler generating ELF executable Arm architecture, supporting JIT execution

Primary LanguageCOtherNOASSERTION

AMaCC = Arguably Minimalist Arm C Compiler

Introduction

AMaCC is a 32-bit Arm architecture compiler built from scratch. It serves as a stripped-down version of C, designed as a pedagogical tool for learning about compilers, linkers, and loaders.

There are two execution modes AMaCC implements:

  • Just-in-Time (JIT) compiler for Arm backend.
  • Generation of valid GNU/Linux executables using the Executable and Linkable Format (ELF).

It is worth mentioning that AMaCC is designed to compile a subset of C necessary to self-host with the above execution modes. For instance, it supports global variables, particularly global arrays.

A simple stack-based Abstract Syntax Tree (AST) is generated through cooperative stmt() and expr() parsing functions, both fed by a token-generating function. The expr() function performs some literal constant optimizations. The AST is transformed into a stack-based VM Intermediate Representation (IR) using the gen() function. The IR can be examined via a command-line option. Finally, the codegen() function generates Arm32 instructions from the IR, which can be executed via either jit() or elf32() executable generation

AMaCC combines classical recursive descent and operator precedence parsing. An operator precedence parser proves to be considerably faster than a recursive descent parser (RDP) for expressions when operator precedence is defined using grammar productions that would otherwise be turned into methods.

Compatibility

AMaCC is capable of compiling C source files written in the following syntax:

  • support for all C89 statements except typedef.
  • support for all C89 expression operators.
  • data types: char, int, enum, struct, union, and multi-level pointers
    • type modifiers, qualifiers, and storage class specifiers are currently unsupported, though many keywords of this nature are not routinely used, and can be easily worked around with simple alternative constructs.
    • struct/union assignments are not supported at the language level in AMaCC, e.g. s1 = s2. This also applies to function return values and parameters. Passing and returning pointers is recommended. Use memcpy if you want to copy a full struct, e.g. memcpy(&s1, &s2, sizeof(struct xxx));
  • global/local variable initializations for supported data types
    • e.g., int i = [expr]
    • New variables are allowed to be declared within functions anywhere.
    • item-by-item array initialization is supported
    • but aggregate array declaration and initialization is yet to be supported e.g., int foo[2][2] = { { 1, 0 }, { 0, 1 } };

The architecture support targets armv7hf with Linux ABI, and it has been verified on Raspberry Pi 2/3/4 with GNU/Linux.

Prerequisites

  • Code generator in AMaCC relies on several GNU/Linux behaviors, and it is necessary to have Arm/Linux installed in your build environment.

  • Install GNU Toolchain for the A-profile Architecture

    • Select arm-linux-none-gnueabihf (AArch32 target with hard float)
  • Install QEMU for Arm user emulation

sudo apt-get install qemu-user

Running AMaCC

Run make check and you should see this:

[ C to IR translation          ] Passed
[ JIT compilation + execution  ] Passed
[ ELF generation               ] Passed
[ nested/self compilation      ] Passed
[ Compatibility with GCC/Arm   ] ........................................
----------------------------------------------------------------------
Ran 52 tests in 8.842s

OK

Check the messages generated by make help to learn more.

Benchmark

AMaCC is able to generate machine code really fast and provides 70% of the performance of gcc -O0.

Test environment:

  • Raspberry Pi 4B (SoC: bcm2711, ARMv8-A architecture)
  • Raspbian GNU/Linux, kernel 5.10.17-v7l+, gcc 8.3.0 (armv7l userland)

Input source file: amacc.c

compiler driver binary size (KiB) compile time (s)
gcc with -O0 -ldl (compile+link) 56 0.5683
gcc with -O0 -c (compile only) 56 0.4884
AMaCC 100 0.0217

Internals

Check Intermediate Representation (IR) for AMaCC Compilation.

Acknowledgements

AMaCC is based on the infrastructure of c4.

Related Materials