Yoho 🔥 is a toy compiler crafted in Mojo 🔥 and designed to generate RISC-V assembly language.
This project is currently under active development.
Please refer to test.sh
for example programs that the current version of the compiler supports.
The tokenizer for this compiler is implemented quite manually. It calls the re2
Python library
for regular expression operations at the low level.
The parser is based on Parsing Expression Grammar (PEG). It is capable of automatically generating parsers from .gram files. Below is the meta-grammar used for defining the grammar:
start: grammar ENDMARKER { grammar }
grammar: rules=rule+ { Grammar(rules) }
rule:
| NAME ':' NEWLINE INDENT rhs NL DEDENT { Rule(name.text, rhs)}
| NAME ':' rhs NL { Rule(name.text, rhs)}
rhs:
| '|' alt NEWLINE rhs { Rhs(List(alt) + rhs.args)}
| '|' alt NEWLINE { Rhs(List(alt)) }
| alt NEWLINE { Rhs(List(alt)) }
alt:
| items action { Alt(items, action) }
| items { Alt(items) }
items: items=named_item+ { Items(items) }
named_item:
| NAME '=' item { NamedItem(item, name.text)}
| item { NamedItem(item) }
item:
| atom '*' { Repeat0(atom) }
| atom '+' { Repeat1(atom) }
| sep=atom '.' node=atom '+' { Gather(sep.text, node) }
| atom
atom:
| '(' items ')' { Group(items.args) }
| NAME { Atom(name.text) }
| STRING { Atom(string.text) }
action: '{' target '}' { target }
target:
| NAME target_atoms { Action(target_atoms, name)}
| target_atom { Action(target_atom) }
target_atoms:
| target_atom target_atoms { String(target_atom + ' ' + target_atoms) }
| target_atom
target_atom:
| NAME { String(name.text) }
| NUMBER { String(number.text) }
| ',' { String(', ') }
| '+' { String(' + ') }
| '(' { String('(') }
| ')' { String(')') }
| '.' { String('.')}
This meta-grammar provides a flexible and powerful way to define and generate the parser.
The parse tree can be nicely printed out:
0:12 ┃ ┃ BinOp
0:1 ┃ '1' ┃ NUMBER ✔
1:2 ┃ '-' ┃ MINUS ✔
3:12 ┃ ┃ BinOp
3:8 ┃ ┃ BinOp
3:4 ┃ '2' ┃ NUMBER ✔
4:5 ┃ '+' ┃ PLUS ✔
5:8 ┃ ┃ BinOp
5:6 ┃ '3' ┃ NUMBER ✔
6:7 ┃ '*' ┃ STAR ✔
7:8 ┃ '2' ┃ NUMBER ✔
8:9 ┃ '-' ┃ MINUS ✔
9:12 ┃ ┃ BinOp
9:10 ┃ '3' ┃ NUMBER ✔
10:11 ┃ '/' ┃ SLASH ✔
11:12 ┃ '2' ┃ NUMBER ✔
The code generator simply reads in an ast and emit RISC-V assembly. Yes some of the source code might look silly to you and it was somewhat intentional. Simplicity and readability for first-time readers is my top priority. There is no IR or LLVM/MLIR in yoho.
This compiler was developed with inspiration and reference from the following projects:
Ensure you have the following installed:
- Mojo Nightly Version. Visit the Mojo Lang website and follow the instructions to download and install the nightly version of Mojo.
- RISC-V Toolchain. Visit this guide to install RISC-V toolchain.
- Clone the Repository.
- Build the Compiler:
make yoho
- Run test:
make test
- Clean Up:
make clean
Here is an example of the output RISC-V assembly code of '12+ 45-(4-3>2)':
.global main
main:
li t0, 1
li t1, 2
mul t0, t0, t1
li t1, 4
li t2, 5
mul t1, t1, t2
add t0, t0, t1
li t1, 4
li t2, 3
sub t1, t1, t2
li t2, 2
slt t1, t2, t1
sub t0, t0, t1
mv a0, t0
ret
Please refer to test.sh
for more example programs.
Inspired by chibicc's approach to maintaining a clean commit history, we adopt a similar style for handling contributions.
When a bug is found in this compiler, I trace back to the original commit that introduced the bug and rewrite the commit history as if the bug never existed. This method, while unconventional, ensures that each commit remains bug-free, which is crucial for the integrity of the project.
The repository is committed to "every commit is bug free". If you discover a bug and submit an issue, I will apply the necessary changes to the relevant previous commits by rewriting the history.
This project is licensed under the MIT License. See the LICENSE file for details.