/TDOPjs

TDOP parser in and for JavaScript

Primary LanguageJavaScript

Top Down Operator Precedence JavaScript Parser

In Douglas Crockford's article Top Down Operator Precedence {1} he implements a parser first presented by Vaughan Pratt {5}. I've been working on writing out Crockford's Parser with the hopes of extending it for fun.

Motivation

I began typing out Crockfords parser implementation (using the published version from Beautiful Code) over the Holiday Break from University between 2016-2017. I needed a simple coding project to practice learning VIM before my Programming 2 course for the Spring of 2017, where I knew we would be primarily using the instructors Linux server via SSH. So typing out Crockford's code was a great way to practive VIM, and also learn a little bit about parsing in the process. I did my best to understand the code, but it remained ellusive even though I could grok the simplicity of its design.

I've decided to return to the project and add extensive documentation as a way to learn what the parser does. My plan is to flesh out this README file with explanations from Pratt's original paper, as well as other notes from other people's implementations (See References at the bottom). I also plan to add a lot of comments to the code itself, at the risk of looking like a n00b. Basically I'm writing this for a new JavaScript coder to be able to write their own (toy) parser, incase they want to better understand how computers work.

Simplified JavaScript

Crockfords original parser was designed for parsing Simplified JavaScript, or "just the good stuff" {1}. In his own words, it includes:

  • Functions as first class objects. Functions in Simplified JavaScript are lambdas with lexical scoping.

  • Dynamic objects with prototypal inheritance. Objects are class-free. We can add a new member to any object by ordinary assignment. An object can inherit members from another object.

  • Object literals and array literals. This is a very convenient notation for creating new objects and arrays. JavaScript literals were the inspiration for the JSON data interchange format{1}.

TDOP Basics

Top Down Operator Precedence (TDOP) is a type of parser that is based on a Recursive descent model, but it differs by treating the tokens as something simmilar to objects. Recursive descent associates 'semantic actions' with grammer rules, while TDOP has the actions associated with the tokens {2}.

This feature allows for TDOP to be implemented very well by a dynamic functional-object-oriented language like JavaScript. Generic objects are first created by a factory function, and methods can then be dynamically assigned to allow for the token to handle its own parsing logic.

Background

Top Down Operator Precedence is Vaughan Pratt's generalized form of what is otherwise known as 'precedence climbing', used in recursive descent parsers {3}. If any of that sentence made sense to you, then congratulations! If not, then I'll try to explain.

Recursive descent parsers are very effective means for parsing statements, where tokens are neatly arranged in rigid pattterns. Unfortunately, they don't do so well with parsing expressions {4}. This is where Pratt's parser does its magic: it's very good at dealing with expressions because each token object is assigned a precedence, or binding power in Pratt's terminology {3}.

The binding power is a value that determines how tightly an operator binds to its operands. When operator tokens are parsed, their binding powers are inspected by an expression() function which recursively gobbles up tokens into expression trees that branch according to the binding powers.

TODO Add more explanations. For now, check out Eli Bendersky's blog post about the subject.

Use

As this project is in a very early stage, it has limited useability. It currently runs on NodeJS, but I hope to extend it for the browser too. The printTree module will take a JS file as an argument, and output the tree generated by parser.js to the command-line.

A few Simplified JavaScript test files are included; for instance, test03.js looks like:

test03.js

var beans = function cool (x, y, z) {
  return x + y * z;
};

You can see how the parser works by using printTree on the test03.js file (or any Simplified JavaScript) via

$ node printTree test03.js

The output should look like so:

Scanning..
Parsing..
Printing Tree for test03.js:

{ value: '=',
  arity: 'binary',
  first: 
   { value: 'beans',
     arity: 'name',
     reserved: false,
     nud: [Function: itself],
     led: null,
     std: null,
     lbp: 0,
     scope: 
      { def: 
         { var: { value: 'var', arity: 'name', reserved: true },
           beans: [Circular] },
        parent: undefined } },
  second: 
   { value: 'function',
     arity: 'function',
     name: 'cool',
     first: 
      [ { value: 'x',
          arity: 'name',
          reserved: false,
          nud: [Function: itself],
          led: null,
          std: null,
          lbp: 0,
          scope: 
           { def: 
              { cool: 
                 { value: 'cool',
                   arity: 'name',
                   reserved: false,
                   nud: [Function: itself],
                   led: null,
                   std: null,
                   lbp: 0,
                   scope: [Circular] },
                x: [Circular],
                y: 
                 { value: 'y',
                   arity: 'name',
                   reserved: false,
                   nud: [Function: itself],
                   led: null,
                   std: null,
                   lbp: 0,
                   scope: [Circular] },
                z: 
                 { value: 'z',
                   arity: 'name',
                   reserved: false,
                   nud: [Function: itself],
                   led: null,
                   std: null,
                   lbp: 0,
                   scope: [Circular] },
                return: 
                 { value: 'return',
                   arity: 'statement',
                   reserved: true,
                   first: 
                    { value: '+',
                      arity: 'binary',
                      first: { value: 'x', arity: 'name' },
                      second: 
                       { value: '*',
                         arity: 'binary',
                         first: { value: 'y', arity: 'name' },
                         second: { value: 'z', arity: 'name' } } } } },
             parent: 
              { def: 
                 { var: { value: 'var', arity: 'name', reserved: true },
                   beans: 
                    { value: 'beans',
                      arity: 'name',
                      reserved: false,
                      nud: [Function: itself],
                      led: null,
                      std: null,
                      lbp: 0,
                      scope: [Circular] } },
                parent: undefined } } },
        { value: 'y',
          arity: 'name',
          reserved: false,
          nud: [Function: itself],
          led: null,
          std: null,
          lbp: 0,
          scope: 
           { def: 
              { cool: 
                 { value: 'cool',
                   arity: 'name',
                   reserved: false,
                   nud: [Function: itself],
                   led: null,
                   std: null,
                   lbp: 0,
                   scope: [Circular] },
                x: 
                 { value: 'x',
                   arity: 'name',
                   reserved: false,
                   nud: [Function: itself],
                   led: null,
                   std: null,
                   lbp: 0,
                   scope: [Circular] },
                y: [Circular],
                z: 
                 { value: 'z',
                   arity: 'name',
                   reserved: false,
                   nud: [Function: itself],
                   led: null,
                   std: null,
                   lbp: 0,
                   scope: [Circular] },
                return: 
                 { value: 'return',
                   arity: 'statement',
                   reserved: true,
                   first: 
                    { value: '+',
                      arity: 'binary',
                      first: { value: 'x', arity: 'name' },
                      second: 
                       { value: '*',
                         arity: 'binary',
                         first: { value: 'y', arity: 'name' },
                         second: { value: 'z', arity: 'name' } } } } },
             parent: 
              { def: 
                 { var: { value: 'var', arity: 'name', reserved: true },
                   beans: 
                    { value: 'beans',
                      arity: 'name',
                      reserved: false,
                      nud: [Function: itself],
                      led: null,
                      std: null,
                      lbp: 0,
                      scope: [Circular] } },
                parent: undefined } } },
        { value: 'z',
          arity: 'name',
          reserved: false,
          nud: [Function: itself],
          led: null,
          std: null,
          lbp: 0,
          scope: 
           { def: 
              { cool: 
                 { value: 'cool',
                   arity: 'name',
                   reserved: false,
                   nud: [Function: itself],
                   led: null,
                   std: null,
                   lbp: 0,
                   scope: [Circular] },
                x: 
                 { value: 'x',
                   arity: 'name',
                   reserved: false,
                   nud: [Function: itself],
                   led: null,
                   std: null,
                   lbp: 0,
                   scope: [Circular] },
                y: 
                 { value: 'y',
                   arity: 'name',
                   reserved: false,
                   nud: [Function: itself],
                   led: null,
                   std: null,
                   lbp: 0,
                   scope: [Circular] },
                z: [Circular],
                return: 
                 { value: 'return',
                   arity: 'statement',
                   reserved: true,
                   first: 
                    { value: '+',
                      arity: 'binary',
                      first: { value: 'x', arity: 'name' },
                      second: 
                       { value: '*',
                         arity: 'binary',
                         first: { value: 'y', arity: 'name' },
                         second: { value: 'z', arity: 'name' } } } } },
             parent: 
              { def: 
                 { var: { value: 'var', arity: 'name', reserved: true },
                   beans: 
                    { value: 'beans',
                      arity: 'name',
                      reserved: false,
                      nud: [Function: itself],
                      led: null,
                      std: null,
                      lbp: 0,
                      scope: [Circular] } },
                parent: undefined } } } ],
     second: 
      { value: 'return',
        arity: 'statement',
        reserved: true,
        first: 
         { value: '+',
           arity: 'binary',
           first: { value: 'x', arity: 'name' },
           second: 
            { value: '*',
              arity: 'binary',
              first: { value: 'y', arity: 'name' },
              second: { value: 'z', arity: 'name' } } } } } }

Future Work

As this project is a work-in progress, there is always room for improvement. One area for improvement might be to follow Andy Chu's lead {6} and re-design the implementation to accomplish some of the following:

  • Avoid Globals
  • Use 'pure' functions opperating on the AST as data
  • Increase the modularity by allowing a 'spec' definition of nud and led functions be defined apart from the parsing mechanics

The last part is the most immediately useful, because a spec file could be used to define a DSL in a modular way. Also, language revisions could be separated out into versions based on spec files.

References

[1] http://javascript.crockford.com/tdop/tdop.html

    Douglas Crockford's original TDOP parser.

[2] http://eli.thegreenplace.net/2010/01/02/top-down-operator-precedence-parsing/

    This post by Eli Bendersky is a solid explanation of how TDOP works.

[3] http://www.oilshell.org/blog/2016/11/01.html

    Andy Chu. Pratt parsing equivalent to precedence climbing.

[4] http://journal.stuffwithstuff.com/2011/03/19/pratt-parsers-expression-parsing-made-easy/

    Another awesome blog-post about Pratt parsers from Bob Nystrom.

[5] https://tdop.github.io/

    Carl Smith kindly made an HTML version of Pratt's original paper!

[6] https://github.com/andychu/pratt-parsing-demo

    Andy Chu [see ref 3] also implemented his own Pratt parser in Python in a clean and modular way.