This project is a personal programming exercise. It consists mainly in a compiler for a subset of C, targeting 32-bit x86 (via GNU assembler). This 32-bit x86 compiler is tested on FreeBSD, GNU / Linux, and Windows (via the MinGW toolchain). There is now very rough and still buggy support for amd64 code generation, presently only tested on GNU/Linux and FreeBSD (new as of January 2018). It does not presently work on Windows/mingw64. The 32-bit x86 version (and now also amd64 version) can compile itself ! I started this in highschool in 2012 and the code has turned into a mess. I am now attending a CS program, since 2013, & if I started this project again from scratch I should follow better development principles! The code is not particularly maintainable or readable, this is what happens when an overzealous high school student wants to ``prove himself'' and writes large amounts of code like an insane madman. Programming is like a drug Corporate-quality code this ain't. but definitely i was enthusiastic and willing, i suppose that is the point Although, I think the lexer and parser are not *that* bad. The codegen_x86 module is extremely nasty in a few parts, it has some very messy stuff when it sets up symbols. And there are few things that need to be refactored... ---------------------- X86 QUICKSTART ----------------------- $ cd compiler/ $ ./build-x86.sh $ ./compile-run-x86.sh test/x86/recursive-fib.c Note that 32-bit C libraries must be installed. ------------------------------------------------------------- For further info: please go in the "compiler" directory and read "compiler/docs/docs-x86.txt" for info about the supported C features, how to build the program, etc. (There is some other useful information in the "compiler/docs" directory, for example docs/wcc.txt tells you how to build a standalone "wcc" command). The test programs "compiler/test/x86/*.c" should give an idea of the subset of C accepted at this point. Highlights include: pointers and multi-dimensional arrays of structs, ints, and chars; structs with self-referential pointers; support for several libc calls (e.g. printf); custom procedures with recursion allowed; for-loops; typedefs; string constants; switch statements compiled to jump tables; (some) function prototypes; (some) array initializers. There are still bugs, probably especially in constructs not yet tried in the test programs. Some expressions may not compile properly yet, or compile at all, for that matter. This is a work-in-progress. Big real serious professional or academic compilers likely use more sophisitcated techniques; I'm doing this project to learn because I still have a lot to learn. I haven't read the standard. I'm mostly looking at gcc's x86 output and imitating that. It's not going to give a professional-quality standard compiler but it will certainly be a good learning experience for me. This really really really isn't an optimizing compiler! ------------------------------------------------------------------------- This compiler can also target my terrible old bytecode VM, arbitrarily named "BPF" -- for details, see "README-BPF" and "compiler/docs/docs-bpf.txt" Note that the BPF compiler supports a much smaller subset of C than does the x86 compiler, and it may have some weird modifications. ------------------------------------------------------------------------- Future project ideas: MIPS codegen (i'm learning MIPS at school as of early 2014) The compiler could be improved, merged with a well-designed VM and hacked into a big REPL to create a script interpreter. Supporting a larger subset of C ? Not sure if it would be possible to do this by growing this code. Support for other real architectures than x86, say e.g. powerpc or arm (PowerPC does not look easy.) Support for popular serious VMs like JVM / .NET / LLVM ------------------------------------------------------------------------- Project started: December 2012