/parsertongue

Golang Parser

Primary LanguageGo

Parsertongue

This name is not permanent. It's a play-on-words with parseltongue, the language of snakes in Harry Potter. I'd like to come up with a single name for the final, unified platform, but that might be awhile off, so parsertongue will suffice for now.

Goals

The first goal is to build a lexer that can efficiently lex any regular language. The goal is not to implement any particular lexing algorithm, but to implement my own, parallel approach that can handle any regular language. If the grammar of lexemes is context-free, you should reconsider rewriting your grammar to move the context-free elements to the non-lexical productions so that an efficient linear-time lexer is implementable.

The second goal of this project is to implement an Earley parser. Earley parsers are the most efficient parsers for context-free and ambiguous languages. They will parse any sentence, handling left and right recursion in the grammar in the fastest possible time complexity. The idea is to implement an Earley parser so that users can come along and drop in any EBNF grammar and have this program spit out a syntax tree for them.

Third, this project, as it grows, will ultimately hope to become a source of knowledge for individuals knew to language design and creation. As I learn move, I will codify my opinions in some web-deployable format here so that others can read and understand not only my design decisions for Parsertongue, but also best-practices for language design. We've encountered one already: the grammar of your lexemes should be regular, not context-free. I can go into more details about why and how, which I will do at a later time. But that serves as an example of the kind of thing I've discovered that I've been unable to read in a book.

In my experience reading compiler texts, I find a lot of the books explain the "how" of language implementation decently well. They cover the algorithms well and discuss their role in the larger machine of the compiler. They do not, however, talk about best practices, or alternative models. They usually present one view of the world which may or may not be compatable with the reader's. For example, the Dragon book doess not give a single example of how to lex UTF-8 languages. All of the described algorithms are table based. With UTF-8, a table approach would be next to impossible, needing millions and millions of rows. Even populating the table would be impossible. A reader seeking to implement a lexer for a language that supports UTF-8 characters would find themselves unable to do so reading the Dragon book alone. I hope to explain my solutions to these problems or abstract them away from the user such that they never need to worry about them. Hopefully, parsertongue will support UTF-8 out of the box.