This repository is a companion to my 2018 talk entitled Parsing Formal Languages with Swift (a copy of which is included in the repository).
The included Xcode project contains Swift implementations of a lexer, parser, formatter, and interpreter for a very simple language.
Neither the language nor the parser are particularly useful in their own right, but the techniques demonstrated in the code can be applied to more complex languages and applications.
The toy language implemented in this project has the following syntax:
-
Numbers
Both integers and floating point numbers are supported, up to the precision of a Swift
Double
. Negative numbers are not supported, nor exponentials, hexadecimal, etc.5 0.5 57000 .1
-
String literals
String literals must be enclosed in doubles quotes (
"
). String literals can contain any character including a carriage return or linefeed. Double quotes must be escaped with a backslash (\
) as must the backslash character itself."Hello World" "This string has a linebreak in it" "This string has \"escaped quotes\"" "The string has an escaped \\ (backslash)"
-
Variables
Variables names must begin with a letter, optionally followed by one or more letters or numbers. Other symbols such as
_
or emoji are not supported.Any valid name can be used as a variable apart from the keywords
let
andprint
which are reserved.a foo Bar56
-
Expressions
An expression can contain one or more numbers, string literals or variables. Variables must be declared before they are used in an expression.
Expressions also support the infix
+
operator, which performs an addition if both operands are numeric, or a concatenation if either operand is a string. Expressions may contain multiple+
operations.Parentheses are not supported, nor are any other operators (unless you are using the operator-precedence branch, which supports the
*
operator for multiplication of numbers).foo 5 hello + "world" 1 + 2 + 3 "Jonny" + 5
-
Declarations
A declaration creates a variable and assigns a value to it. The value can be any valid expression, and can reference variables declared previously.
Variables have no intrinsic type beyond their currently stored value. You may redeclare the same variable multiple times with different values and types.
let bar = "cat" let foo = 5 + bar
-
Print
The print statement outputs a value to the console. As with declarations, the value can be any valid expression.
print "Hello World" print 5 + 6
-
Whitespace
The language is whitespace-agnostic. Any number of linebreaks or spaces can appears between any pair of tokens. Multiple statements can be placed on the same line, and space is only required between tokens in cases where its omission would result in ambiguity (e.g. between a keyword and a variable).
-
Comments
There is no support for comments.
The Parsing.xcodeproj project requires Xcode 10.1 or higher. It builds a Swift framework with the following features:
-
Lexer
This is a lexer for the simple language described in the talk. The lexer is invoked via the
tokenize()
function, which takes aString
as input and returns an array ofToken
s. All code for the lexer is included in Lexer.swift and it has no other dependencies besides Foundation.// usage import Parsing let input = "let foo = 5 \n print foo" let tokens = try! tokenize(input) print(tokens)
-
Parser
This is a parser for the language. The parser is invoked via the
parse()
function, which takes aString
as input and returns an array ofStatement
s. All code for the parser is included in Parser.swift, which depends only on Lexer.swift and has no other dependencies besides Foundation.// usage import Parsing let input = "let foo = 5 \n print foo" let statements = try! parse(input) print(statements)
-
Formatter
This is a formatter or "pretty-printer" for the language, as mentioned in the talk. The formatter is invoked via the
format()
function, which takes an array ofStatement
s as input and returns aString
containing the formatted source code. All code for the formatter is included in Formatter.swift, which depends on Parser.swift, Lexer.swift and Foundation.// usage import Parsing let input = "let foo = 5 \n print foo" let statements = parse(input) let output = try! format(statements) print(output)
-
Interpreter
This is an interpreter for the language, as mentioned in the talk. The interpreter is invoked via the
evaluate()
function, which takes an array ofStatement
s as input and returns aString
containing the output of the program. All code for the interpreter is included in Interpreter.swift, which depends on Parser.swift, Lexer.swift and Foundation.// usage import Parsing let input = "let foo = 5 \n print foo" let statements = parse(input) let output = try! evaluate(statements) print(output)
There are several branches included in the git repository:
-
This contains the basic version of the parsing project, as described in the talk.
-
This branch includes an alternative lexer implementation that uses
NSRegularExpression
. -
This branch adds support for the multiplication operator, as detailed in the enhancements section of the talk.
-
This branch improves the error handling mechanism, as suggested in the enhancements section of the talk:
-
Tokens and AST nodes now include source range information.
-
Multiple lexing errors can now be detected and handled at a later stage of the parser.
-
Error types are now more specific and include sufficient information to derive the exact source location of the error, even at the interpreter stage.
-
As a bonus, the lexer also includes a method for converting a source string index to a line/column offset.
-