/muon_lang

A simple transpiler to c

Primary LanguageCGNU General Public License v3.0GPL-3.0

alt text

About


This project is intended to be viewed as a simple demonstration for a transpiler written in < 2000 lines of c code. It was originally intended to be just a simple exercise in writing a basic Parser-Combinator. Altough it is able to produce valid c code, it is still far from being usable without a firm understanding of how the transpiler works.

It is also my first project i would really consider finished. (for myself and the public eye)

That said if you're still interessted, i invite you to have a look around. You can contact me at gerrit.proessl@gmail.com. I would certainly appreciate any feedback :)

Names that end with _t are reserved by POSIX

The Muon Language


Muon is a syntactically simple imperative language. The transpiler emits c code. It strongly relies on and heavily utilizes constructs found in the c language. It is only possible to transpile one file programs. The formal grammar for the language can be found at ./lang.grammar

Building


$ make

Usage


$ build/comp example.mn output.c
$ gcc output.c

Example:


printf() -> void;

foo_t {
  id: *char;
}

main() -> int 
foo: foo_t = (init "Hello Muon");
i: int = 0; 
{
  (printf "%s\n" (get foo id));

loop:
  (printf "i: %d\n" i); 
  (inc i);
  jmp (lt i 10) loop;
  
  ret 0;
}

produces ->

#define set(lexp, rexp)  (lexp = rexp)   
#define ref(exp)         (&exp)          
#define deref(exp)       (*exp)          
#define get(lexp, rexp)  (lexp.rexp)     
#define pget(lexp, rexp) (lexp->rexp)    
#define aget(exp, index) (exp[index])    
#define cast(exp, type)  ((type)exp)     
#define size(exp)        (sizeof(exp))   
#define lst(...)         ( __VA_ARGS__ ) 
#define init(...)        { __VA_ARGS__ } 
// UNARY_OPERATORS                       
#define inc(exp)         (exp++)         
#define dec(exp)         (exp--)         
#define pos(exp)         (+exp)          
#define neg(exp)         (-exp)          
#define bnot(exp)        (~exp)          
#define not(exp)         (!exp)          
// BINARY_OPERATORS                      
#define add(lexp, rexp)  (lexp + rexp)   
#define sub(lexp, rexp)  (lexp - rexp)   
#define mul(lexp, rexp)  (lexp * rexp)   
#define div(lexp, rexp)  (lexp / rexp)   
#define and(lexp, rexp)  (lexp && rexp)  
#define or(lexp, rexp)   (lexp || rexp)  
#define mod(lexp, rexp)  (lexp % rexp)   
#define lt(lexp, rexp)   (lexp < rexp)   
#define gt(lexp, rexp)   (lexp > rexp)   
#define eq(lexp, rexp)   (lexp == rexp)  
#define leq(lexp, rexp)  (lexp <= rexp)  
#define geq(lexp, rexp)  (lexp >= rexp)  
#define band(lexp, rexp) (lexp & rexp)   
#define bor(lexp, rexp)  (lexp | rexp)   
#define bxor(lexp, rexp) (lexp ^ rexp)   
#define ls(lexp, rexp)   (lexp << rexp)  
#define rs(lexp, rexp)   (lexp >> rexp)  

void printf();
typedef struct foo_t foo_t;
typedef struct foo_t {
char* id ;
} foo_t;
int main() {
foo_t foo  = init("Hello Muon");
int i  = 0;
printf("%s\n", get(foo, id));
loop:
printf("i: %d\n", i);
inc(i);
if(lt(i, 10)) goto loop;
return 0;
}

You can find another example program at ./example.mn.

Documentation


Types

The languages support four basic forms of types.

  • Array Types
  • Function Types
  • Pointer Types
  • Identifier Types

Identifier Types

As the transpiler does no typechecking there is no need for introducing primative types. All primative types and compound types (defined by structures) are just Identifier Types.

Array Types

Array Types are for declaring an array of variables. Array Types are declared in the following way:

[type; exp]

where type describes the type of an element and exp describes the size of the array as an expression. (Note: the size needs to be a constant expression if the array is declared outside of a function)

Function Types

Function types are for declaring a pointer to a function. They are declared in the following way:

(param_type_1, param_type_2, ...) -> return_type

where param_type_n is the type of the n-th parameter of the function and return_type is the return type of the function.

Pointer Type

Pointer Types are for declaring pointers to a variable. They are declared in the following way:

*type

where type is the type of the variable to pointer is pointing to.

Structures

id {
  var_id: type;
  ...
}

Structures pretty much act like typedefined c-structs. id is the identifier of the structure and inside the curly brackets are the variables the structure holds.

Functions

function_id (var_id: type, ...) -> return_type 
local_var: type = exp;
...
{
  statement;
}

A Function starts with its identifier followed by its parameters and return type. In front of the function body all variables that will be used additionally to the paramters get defined and initiated.

Statements

There are four kinds of Statements.

  • Expression Statement
  • Label Statement
  • Jump Statement
  • Return Statement

All Statements are terminated by a semicolon.

Expression Statement

Just a Expression terminated by a semicolon.

Label Statement

label:

A Label Statement marks a line of code where one can jump to with the Jump Statement.

Jump Statement

Conditional Jump Statement:

jmp exp label;

Jump Statement:

jmp label;

Continues execution at the specified label if the condition is met or if no condition is specified.

Return Statement

ret exp;

Returns the value of exp from the current function.

Expressions

There are sic types of Expressions

  • Integer Expression
  • Identifier Expression
  • String Expression
  • Float Expression
  • Char Expression
  • Call Expression

Integer Expression

Just a integer literal 10 2

Identifier Expression

Just a identifier var foo

String Expression

Just a ltring literal "Hello" "World"

Char Expression

Just a char literal 'c' '\n'

Call Expression

(call_id param_1 param_2 ...)

A Call Expression ist mostly just a function call, where the call_id is is the first element in a list of expressions followed by its parameters, all inside round brackets.

Certain functionality which dont directly corrospond to function calls in c are also exposed through call expressions by some predefined c macros. These macros are just prepended to the output.

Call Macro List:

  • set(lexp, rexp) -> (lexp = rexp)
  • ref(exp) -> (&exp)
  • deref(exp) -> (*exp)
  • get(lexp, rexp) -> (lexp.rexp)
  • pget(lexp, rexp) -> (lexp->rexp)
  • aget(exp, index) (exp -> [index])
  • cast(exp, type) -> ((type)exp)
  • size(exp) (sizeof -> (exp))
  • lst(...) -> ( __VA_ARGS__ )
  • init(...) -> { __VA_ARGS__ }

UNARY OPERATORS

  • inc(exp) -> (exp++)
  • dec(exp) -> (exp--)
  • pos(exp) -> (+exp)
  • neg(exp) -> (-exp)
  • bnot(exp) -> (~exp)
  • not(exp) -> (!exp)

BINARY_OPERATORS

  • add(lexp, rexp) -> (lexp + rexp)
  • sub(lexp, rexp) -> (lexp - rexp)
  • mul(lexp, rexp) -> (lexp * rexp)
  • div(lexp, rexp) -> (lexp / rexp)
  • and(lexp, rexp) -> (lexp && rexp)
  • or(lexp, rexp) -> (lexp || rexp)
  • mod(lexp, rexp) -> (lexp % rexp)
  • lt(lexp, rexp) -> (lexp < rexp)
  • gt(lexp, rexp) -> (lexp > rexp)
  • eq(lexp, rexp) -> (lexp == rexp)
  • leq(lexp, rexp) -> (lexp <= rexp)
  • geq(lexp, rexp) -> (lexp >= rexp)
  • band(lexp, rexp) -> (lexp & rexp)
  • bor(lexp, rexp) -> (lexp | rexp)
  • bxor(lexp, rexp) -> (lexp ^ rexp)
  • ls(lexp, rexp) -> (lexp << rexp)
  • rs(lexp, rexp) -> (lexp >> rexp)