/awesome-language-engineering

✨ A curated list of useful resources for computer language engineering and theory

Creative Commons Zero v1.0 UniversalCC0-1.0

Awesome Language Engineering Awesome

alan_behind

A curated list of useful resources for computer language engineering and theory

Whether you want to create a text-processor, a parser, a language application, a DSL (Domain Specific Language), or a full-fledged programming language with compiler and tooling, this page serves as a directory map to point you to the right direction.

Better yet, help others finding their way by contributing to this page with the resources that you think useful.

Contents

Tools

Just like other domains, knowing the available tools that are tried-and-true will save you a lot of time and efforts. Furthermore, you will also learn the emerging techniques that are adopted in different tools which make the skills more transferable.

ANTLR (ANother Tool for Language Recognition)

A powerful parser generator for reading, processing, executing, or translating structured text or binary files. It's widely used to build languages, tools, and frameworks. From a grammar, ANTLR generates a parser that can build and walk parse trees.

Describe language lexical and grammar specification in a declarative file format .g4 (Lex/Yacc format alike), and the generator can create a parser for the following target languages: Java, C#, Python, JavaScript, Go, C++, Swift (see update)

Learning materials:

MPS (Meta Programming System)

With JetBrains MPS, you can define custom editors for any new language and make using these DSLs simpler. Even domain experts, who are not familiar with traditional programming, can easily work in MPS with domain-specific languages designed around their domain-specific terminology.

Learning materials:

Xtext

Xtext is a framework by Eclipse for development of programming languages and domain-specific languages. With Xtext you define your language using a powerful grammar language. As a result you get a full infrastructure, including parser, linker, typechecker, compiler as well as editing support for Eclipse, IntelliJ IDEA and your favorite web browser.

Learning materials:

Sirius

Sirius is an Eclipse project which allows you to easily create your own graphical modeling workbench by leveraging the Eclipse Modeling technologies, including EMF and GMF.

Learning materials:

  • Web: Official Guide: provides an introduction to Sirius and a series of tutorials to get started building your own graphical modeling tool

Flex and Bison

Flex and Bison are aging unix utilities that help you write very fast parsers for almost arbitrary file formats. Lex and Yacc are the original tools; Flex and Bison are their almost completely compatible newer versions.

Learning materials:

  • Web: Flex & Bison Tutorial: this webpage is supposed to be a tutorial for complete novices needing to use Flex and Bison for some real project.

  • Book: Flex & Bison: explains how to use flex and bison to solve your problems quickly. This is the update from the original Lex & Yacc book described below.

  • Book: Lex & Yacc: shows you how to use two Unix utilities, lex and yacc, in program development. These tools help programmers build compilers and interpreters, but they also have a wider range of applications.

Kaitai Struct

A parser generator for reading binary data. This is a declarative language for specifying data structure of binary data in order to generate parser (in multiple target languages) that handles reading binary file formats, network stream packet formats, etc. It comes with a compiler, an IDE, a visualizer, and library of format specs.

Describe binary structure specification in a declarative file format .ksy (YAML alike), and the generator can create a parser for the following target languages: C++/STL, C#, Java, JavaScript, Perl, PHP, Python, Ruby (see update)

Sed and Awk

Sed and Awk are two text processing programs that are mainstays of the UNIX programmer's toolbox.

  • Sed is a stream editor (non-interactive) to do common text editing jobs like search/extract/replace/insert.
  • Awk is a whole programming language ideal for handling data extraction, reporting, and data-reformatting jobs.

Both are command-line interface programs that can be used independently or together nicely for many text processing purposes. They are great for recognizing and extracting information from text input. For simple language recognition tasks, perhaps they are the best tools for the job with the least effort due to their simplicity and targeted use cases. Sed and Awk are part of most, if not all, Linux/Unix/macOS distributions. They are available to download for Windows as well.

Learning materials:

Fundamentals

Books

DSL Engineering

Designing, Implementing and Using Domain-Specific Languages

The definitive resource on domain-specific languages: based on years of real-world experience, relying on modern language workbenches and full of examples. Domain-Specific Languages are programming languages specialized for a particular application domain.

Language Implementation Patterns

Create Your Own Domain-Specific and General Programming Languages

Written by the author of ANTLR, and it is also the tool used in the book, but the general concepts apply regardless of what you use.

Compilers: Principles, Techniques, and Tools

A classic compiler book that is known to professors, students, and developers worldwide as the "Dragon Book"

Writing An Interpreter In Go

Learning how to use a C-like language such as Go to create a complete programming language by applying fundamental concepts of lexer, parser, AST (Abstract Syntax Tree), Pratt technique, and recursive descent parser. This also shows you how to implement a REPL (interactive language shell).

Articles

General:

Paradigms:

Type Systems

License

CC0

To the extent possible under law, Nikyle Nguyen has waived all copyright and related or neighboring rights to this work.