/tl-v

A tiny, proof of concept, programming language.

Primary LanguageV

t(iny)l(ang)

A tiny, proof of concept, programming language.

proc main {
	r = 0;
}

The creation of tl and it's compiler were used to learn the basics of conventional AST based compilers after the completion of the stack based stas programming language.

Syntax

The syntax of of tl is a mingle of C with some alterations. For example, if statements and while loops do not need parentheses circling the condition.

You cannot declare any variables, as there are a limited amount of predefined ones.

s0, s1, s2, s3, s4, s5 | Stack variables
a0, a1, a2, a3, a4, a5 | Argument variables
r                      | Return variable

The stack variables are used to store local values. Use the argument variables to pass values to function calls, expect them to be destroyed on return. The return variable is used to pass return values on return.

Since function calls to not return values, they are not expressions. Use the call keyword to call a function.

call function;

s0 = function;
call s0;

There is only one builtin function, print. It is used to print the value of one number in a0.

Predefining an array of values is also supported using the data keyword, mainly used to test out pointer indexing.

Examples of tl programs are in the examples/ folder.

The tl compiler

The tl compiler is split into 3 equally small files, the lexer, parser and code generator.

Lexer

The lexer is a construct that performs lexical analysis on a string of characters read from a file. The Lexer.get() function is used to scan and return a single token. It scans tokens linearly, with zero backtracking. No past tokens are stored, the parser constantly requests new tokens directly with the functions Lexer.expect(), Lexer.next() and Lexer.curr().

Parser

The parser parses the stream of tokens into an AST representation. The entrypoint to the parser is the Parser.parse() function, returning the root AST node in the tree.

It is a recursive parser, starting by parsing toplevel statements such as function declaration and hardcoded arrays. Whilst inside a function body it then constantly calls the Parser.stmt() to build up all statements inside it. This function handles all language statements/constructs, like if statements and while loops. The Parser.expr() function is used to parse and build up an expression tree with respect to operator precedence. Each specific expression parser works by eating up a operator token until it encounters once with a higher precedence, then calling up the chain of expression parsing functions.

For example, this line of code will generate this AST.

proc main {
	s0 = 10 + 2 * 4;
}
s_proc
  expr
   assign
    ident `s0`
    add
     dec `10`
     mul
      dec `2`
      dec `4`

Code Generator

The code generator takes the tree representation of the program and generates x86_64 assembly of GNU dialect to be used with GCC.

The AST tree format is especially useful as it is an abstraction over the original source code, so the code generator can simply recurse over all nodes in the tree generating code. The entrypoint to the code generator is the Gen.generate_all().

The code generator takes inspiration from V's native backend, specifically with how expressions are handled. It is extremely simple, and generates a lot of useless code, however it gets the job done. Modern compiler optimisations require the AST to be converted to a lower level format first, adding a large amount of complexity and translations.

Expressions are handled in the Gen.expr() function. They are generated by traversing the left path of the current node first. On return, the left path's expression ends up in the rax register, it is then moved to the rcx register and right path of the current node is traversed. When done, the rax and rcx register contains the left path and the right path respectively and are used to perform an operation such as arithmetic or a comparison. This form of divide and conquer recursion allows the code generator to be quite lightweight.

proc main {
	2 + 4 * 10;
}
    mov rax, 10
    push rax
    mov rax, 4
    pop rcx
    imul rax, rcx
    push rax
    mov rax, 2
    pop rcx
    add rax, rcx

Building and compilation

Use the v compiler to generate the compiler. Call the compiler with the program, then pipe the output to GCC.

v -g . -o tl
./tl main.tl | gcc -x assembler -
./a.out