<iframe src="http://docs.google.com/gview?url=http://www.orimi.com/pdf-test.pdff&embedded=true" style="width:718px; height:700px;" frameborder="0"></iframe>
- About
- MinimalistiC Programming Language
- MinimalistiC Compiler
- Usage Guide
- Learning/Teaching Guide
- Example
- Development Process & Log
- Install
- Bibliography
To learn more, please read the sections below on the programming language and its compiler. Installation instructions are at the bottom
This is a personal project of mine to...
- learn how compilers work, writing low level languages, CPU architecture.
- offer this as a learning experience by publically sharing my ideas and documented src code
- create a super-simple C-like programming language as a good starting step for LL programming
- (learn to use VIm)
I aim to develop a programming language for beginners to learn basic programming concepts & conventions.
MinC has very limited set of features and consistent syntax for the leaner to keep focused on a) learning new techniques and b) practicing learned techniques without getting distracted on calling external libaries (such as in Java/Python) or unituitive semantics (C++). It performs extremely well as it is a compiled language (unlike JS or Java) and since it is a 'daugther language' of C, it can perform the same tasks as C can. Find out more: Learning/Teaching Guide.
In this repository is a basic compiler which compiles my variant of C (called MinimalistiC or MinC)
into x86 assembly as a .s file in the most simple way so others can learn from the source code.
MinimalistiC (MinC) is my take on a ultra-simplified, ultra-lightweight, derated version of the C Programming language.
MinC's aims to be...
- minimal, very easy to teach/learn as it lacks niches and only offers essential programming concepts
- consistent, pragmatic use of syntax to easily understand & adapt
- forward compatible with C - it can be (almost) be treated as C code and be compiled & optimised with GCC/TCC
- lightweight, requires a smaller library and compiler than C/C++, faster compilation, smaller memory footprint
- lower-level, more finely controlled optimisations such as memory management without relying on an assembler
- Most importantly, it aims to keep learners focused on a) learning new techniques and b) practicing learned techniques
- preprocessor directives (
#include
,#define
,#ifdef
,#endif
) - preprocessor macros (
__FILE__
,__LINE___
,__TIME__
,__ASM
, etc.) - comments (
//
,/*
,*/
) - data types (
byte
,int
) - static declarations (
static
) - pointers (
*
,&
) - very limited use of arrays (a lá ptr math, declare with
[
,]
) - string literals (
"
) to char arrays - code structures & scope (
{
,}
,.
) - basic data structures (
struct
,union
,.
,,
,;
) - functions (
return
) - conditionals (
if
) - control flow (
while
loops) - arithmetic (
+
,-
,*
,/
) with ((
,)
) (with bidmas) - binary logic (
!
,&&
,||
) - equality testing (
==
,!=
,>
) - linked with the C standard library for portability
That's all. 8 keywords, 4 preprocessor directives,
and a charset of a..z
, 0..9
with 20 symbols . , ; + - * / = # ! & | " > ( ) [ ] { }
\
MinC Compiler mcc
compiles MinC to 32-bit x86 assembly .s files.
- reads input char by char, being lexed into tokens
file.c
andio.c
take in the source code as a streampp.c
for preprocesses input char stream.lex.c
tokenise into one ofNULL
,LITERAL
,IDENTIFIER
,KEYWORD
,SEPARATOR
,OPERATOR
, orEND OF FILE
pp.c
parses preprocessor directives#
and macros- if
-E
flag,dump.c
prints preprocessed code
- parses token stream in
parse.c
(handwritten parser!) into an Abstract Syntax Treeparse.c
parses tokens into abstract instructions- parser is a top-down recursive decent parser, a headache to write
- parser is hand-written by me specifically for MinC as opposed to FLEX/YACC
- if
-fd
flag,dump.c
pretty-prints the AST in readable form
- generate an Intermediate Representation based on that AST
gen_ir.c
generates linear IR with infinite registers from ASTgen_ir.c
preforms a postal-order traversal of the AST- SSA or DAG optimisation is not used, to keep it simple
memalloc.c
manipulates memory on stack from infinite registersgen_x86.c
formats linear IR as actual x86 32-bit code in AT&T formatting- if
-S
flag,dump.c
outputs the human-readble IR as an.s
- assembly and linking are done externally
- GNU's assembler (
as
) converts the human-readable x86.s
to.o
files - if
-c
flag,dump.c
outputs the object file as an.o
- GNU's linker (
ld
) links all.o
files together and with the C standard library - code is linked with the C standard library for portability across OSes
- GNU's assembler (
- MCC is so small it can fit into a 1980s floppy disk (>160KB)
- parser rearranges binary expressions to respect precedence such as BIDMAS (insanely confusing to write)
- generator is non-optimising to keep it simple and easy to understand
- mingw is 32-bit normally, so asm is also 32 bits
- memalloc pushes and pops the stack instead of allocating r10-16 as GAS doesn't support it
- compiler memory management is absent (declares variables on heap but never frees it)
- semantic analysis is non-existent therefore allowing implicit type casting (odd quirk)
- error-checking is minimal too, so errors may be uncaught
The compiler can compile MinC files .mc
as
MinC preprocessed files .mi
MinC ast .md
,
GAS assembly .s
,
object files .o
,
or executables .exe
.
Feed it the source code file and arguments to control its output, it will output a text file.
To compile verbosely into x86 (Powershell):
./mcc -g -S code.mc
To compile & execute .exe (Powershell):
./mcc -o code code.mc ; ./code
Full instructions:
usage: mcc [-v] [-h] [arg1 arg2 ...] <infile1 infile2 ...>
mcc only accepts .mc as inpath, dir and include accepts any types
args:
output:
-o <filename> write output to <filename>
-E .mi print preprocessed
-a .md print AST
-S .s stop before assembly
-c .o stop before linking
preprocessor:
-I <dir> add include path <dir>
-D <macro> <val> set <macro> to <val>
debugger:
-g verbose compiler debugger
-w supress all warnings
-we treat all warnings as errors
info:
-v display version info
-h display this help
MinimalistiC is a programming language I designed for simplifying the C programming language for beginners
Mission written here and here.
To simplify, it keep learners
a) focused on learning new techniques
b) practicising learned techniques to get familiarised using them
There are many programming languages which are easy for beginners, especially Python and Java and Javascript. Some of these language have very simple syntax, for beginners to easily pick up and understand - guided by a tutorial. The tutorials then introduce more concepts, sometimes at lighting paces, building onto more and more abstract concepts such as OOP or utilising other external libraries - this is all done at the teacher's discretion, leaving the user vunerable to bad teaching techniques where they could be overwhelmed with concepts to a point where they have no practical purpose to learning new techniques, forgetting old, useful ones as a result.
The proposed alternative, which is MinC's, is to introduce beginners to a new a language where they are restricted to learn only simple concepts and are encouraged (by the design of the language) to reuse the learned techiqnues, similar to what Scratch does, to achieve an end goal or product. The heavily-simplifed nature of MinC allows begginers to understand the code written by others, as they are limited to a very small featureset (of 8 keywords) and concepts, allowing them to creatively use code to achieve results - inspiring problem solving behaviours similar to what Whitespace or other low-level languages do. The language avoids confusing syntax for absolute beginners such as in C++ or Pascal and uses consistent and understandable semantics.
MinC can be transfered into learning a more powerful and practical language such as C or C++. It is as powerful as C, since it derives from C and borrows it standard library. Because of its forward-compatibility with C (it can be treated more or less as C code with minor adjustments), it can be compiled on large compilers to ensure it is optimised and linked with existing C libaries such as cURL or libPNG. C is a goto language for low-level software developers and used for mission critical applications, such as in NASA's Jet Propulsion Laboratories to develop spacecraft - since MinC can compile as C and theoretically do what C can, it can do just as much.
MinC heavily simplifes C, and therefore limits learners to use the most basic concepts of programming,
- data storage, variables, datatypes, how data is represented and stored
- arithmetic, performing unary and binary operations
- control flow, conditionals and looping and controlling code
- abstraction, functions, structs, unions, how code is simplifed (basic OOP)
This aids the learning of types whilst limiting the varitions of these concepts, for example,
- There are only 2 data types in MinC, opposed to 12 in C (including unsigned and combinations)
- Only
+
,-
,*
,/
as unary operators - Limited equality checking, only
>
exists,<
doesn't -!=
doesn't exist either. and many more.
This code demonstrates all features of MinC, excluding preprocessor directives
helloWorld.mc
#include "standard.h" // preprocessor directives
int a; // external declaration
int func_a (byte a); // function declaration
int main () // main function definition
{
int b = 0; // declaration and assignment
static byte c = 1 + b; // static, of byte type & unary arithmetic, casting
byte *d = &c; // indirection
byte e[3]; // array declaration
e[0] = *d; // array assignment
byte f[6] = "hello"; // assignment of string literals
byte g = func_a(c); // assignment from function call
// conditionals and equality testing
if (g == 0 && e[0] == 1)
{
g = 2 >> 1; // bitwise operations
}
// while loop
while (g > 0)
{
g = g - 2 + 1; // binary precedence with BIDMAS
}
printf("%s world %d\n", f, g); // standard library call
return 0; // return
}
// struct definition
struct struct_a
{
// union definition
union
{
int val_a = 0; // static assignment
byte val_b;
};
};
int func_a (byte a)
{
struct_a s; // struct declaration
s.val_a = 0; // struct member assignment
s.val_b = a; // union member overwrite
return s.val_b;
}
Powershell:
./mcc -o helloWorld helloWorld.mc ; ./helloWorld
hello world 0
Development started 27.08.19 and over 75 manhours across 5 months were dedicated for this project
All commits and changes can be viewed here
Devlog contains all documentation to show my progress of learning and developing a compiler and language - hopefully this will show problems which I faced and how I overcame them as a good reference for anyone who starts a project similar (or not) to this.
Inspired by Nora Sandler's, Rui Ueyama's and this article to make a C compiler, I started to write a compiler. I didn't really want to write a C compiler though, I could just copy code from the hundreds of repositories there are online, so I decided to write my own language where all my ideas could be original and not a replication of others. It was initially named BitC and was drastically different to C. It would be cool though if I wrote a MinC compiler which compiled itself so I could see how inefficient it got over the iterations of self-compilation.
By this point, I was very familiar with C++ from my OpenGL ESC engine and Facial Identification AIFRED projects but relatively unaware of how featureless C was compared to C++.
I chose to write my compiler in C, seeing that it was a popular choice for most compiler development examples online, and knew that C/C++ dealt very closely with lower-level aspects such as controlling memory allocation on both the stack and heap memory and lacked complicated library callings (printf
instead of System.out.println
). C, as a compiled language unlike Java or JS, runs more efficiently and therefore faster - so if I were to write the compiler in C, compile-times of MinC projects will be relatively short - this only matters in large projects and I'm not very sure why someone would write large projects in MinC.
Basically, a compiler is a text converter. It is fed text as input (for example, a .c
file) and outputs text as a .s
assembly or an .exe
executable. I am familiar with GNU's Compiler Collection (GCC) - it runs on terminal or command prompt or whatever text-based shell - sans GUI - so MinC will be done pretty much the same way. A compiler doesn't require dealing with fancy I/O, rendering or any other external libraries or APIs or SDKs - it is a very clean and lightweight project and saves time on learning APIs and debugging them. For example, I had experienced horrible memory leaking with libPNG when implementing texture loading for my OpenGL renderer which tooks days to debug.
Day one was configuring the only I/O I needed, which was to read a file and write one, and begin writing a preprocessor for my C code.
The compiler generates x86 32-bit assembly code for Intel CPUs.
Should work for 32-bit Unix-like OSes and 32-bit Windows.
Tested on Windows 10 and Mac OS X Mojave.
Will not work on Catalina as it has no legacy support for 32-bit applications.
MinGW is required for assembly and linking.
Uses GNU's assembler (GAS) and linker.
Links with C Standard library for portability across all OSes.
Please download the repository and cd into it using command prompt or terminal.
Make sure GNU Make is installed, do
make all
The source code will be compiled by GNU Make into an executable, mcc in the bin file.
Compilation tests will be perfomed automatically after successful compilation by Make.
Run the executable with -h flag for instructions. (Unix terminal)
cd bin ; ./mcc -h
or give it a MinC file, dump parser's AST -fd
and verbosely compile -v
into executable
./mcc -fd -v test/file.mc
Thank you to these sources for information, almost all significant code is cited here.
- Introduction to Compilers and Language Designs by Prof. Douglas Thain (book)
- http://www.cs.man.ac.uk/~pjj/farrell/compmain.html
- http://lisperator.net/pltut/
- https://github.com/rui314/8cc for inspiration
- https://github.com/rui314/9cc for inspiration
- https://github.com/nlsandler/nqcc for inspiration