Implementing Lox, the language described in Robert Nystrom's excellent book on programming languages.
The main components are:
- Scanner: parses the source code into tokens, as defined by the language's lexical grammar. For example, keywords such as
var
,fun
,while
, identifiers such asmyNum
, literals such as"5"
,"stringy string"
and other tokens. The lexical grammar in Lox is a regular grammar, i.e., it is simple enough that it could be implemented via a regular expression. - Parser: A recursive descent parser that parses the flat sequence of tokens emitted by the scanner into nodes in an Abstract Syntax Tree (AST). The structure of the tree is defined by the language's syntactic grammar. This grammar, known as a context-free grammar, is more powerful than the scanner's lexical grammar, and makes the semantics of the program clear via explicit precedence rules. In basic arithemtic, the precedence rules are encoded in the following mnemonic Please Excuse My Dear Aunt Sally. In Lox, there are similar precedence rules that are designed to mimic C's precedence rules. There are in fact two syntactic grammars, one for expressions and the other for statements.
- Interpreter: A tree walk interpreter that traverses the AST generated by the parser, and evaluates "interprets" each node in terms of Java runtime objects.