/ShivyC

C compiler created in Python.

Primary LanguagePythonMIT LicenseMIT

ShivyC Build Status Code Coverage

A hobby C compiler created in Python.

ShivyC demo GIF.


ShivyC is a hobby C compiler written in Python 3 that supports a subset of the C11 standard and generates reasonably efficient binaries, including some optimizations. ShivyC also generates helpful compile-time error messages.

This implementation of a trie is an example of what ShivyC can compile today. For a more comprehensive list of features, see the feature test directory.

Quickstart

x86-64 Linux

ShivyC requires only Python 3.6 or later to compile C code. Assembling and linking are done using the GNU binutils and glibc, which you almost certainly already have installed.

To install ShivyC:

pip3 install shivyc

To create, compile, and run an example program:

$ vim hello.c
$ cat hello.c

#include <stdio.h>
int main() {
  printf("hello, world!\n");
}

$ shivyc hello.c
$ ./out
hello, world!

To run the tests:

git clone https://github.com/ShivamSarodia/ShivyC.git
cd ShivyC
python3 -m unittest discover

Other Architectures

For the convenience of those not running Linux, the docker/ directory provides a Dockerfile that sets up an x86-64 Linux Ubuntu environment with everything necessary for ShivyC. To use this, run:

git clone https://github.com/ShivamSarodia/ShivyC.git
cd ShivyC
docker build -t shivyc docker/
docker/shell

This will open up a shell in an environment with ShivyC installed and ready to use with

shivyc any_c_file.c           # to compile a file
python3 -m unittest discover  # to run tests

The Docker ShivyC executable will update live with any changes made in your local ShivyC directory.

Implementation Overview

Preprocessor

ShivyC today has a very limited preprocessor that parses out comments and expands #include directives. These features are implemented between lexer.py and preproc.py.

Lexer

The ShivyC lexer is implemented primarily in lexer.py. Additionally, tokens.py contains definitions of the token classes used in the lexer and token_kinds.py contains instances of recognized keyword and symbol tokens.

Parser

The ShivyC parser uses recursive descent techniques for all parsing. It is implented in parser/*.py and creates a parse tree of nodes defined in tree/nodes.py and tree/expr_nodes.py.

IL generation

ShivyC traverses the parse tree to generate a flat custom IL (intermediate language). The commands for this IL are in il_cmds/*.py . Objects used for IL generation are in il_gen.py , but most of the IL generating code is in the make_code function of each tree node in tree/*.py.

ASM generation

ShivyC sequentially reads the IL commands, converting each into Intel-format x86-64 assembly code. ShivyC performs register allocation using George and Appel’s iterated register coalescing algorithm (see References below). The general ASM generation functionality is in asm_gen.py , but much of the ASM generating code is in the make_asm function of each IL command in il_cmds/*.py.

Contributing

Pull requests to ShivyC are very welcome. A good place to start is the Issues page. All issues labeled "feature" are TODO tasks. Issues labeled "bug" are individual miscompilations in ShivyC. If you have any questions, please feel free to ask in the comments of the relevant issue or create a new issue labeled "question". Of course, please add test(s) for all new functionality.

Many thanks to our current and past contributers:

References