/bonsai

Bonsai Programming Language

Primary LanguageRubyBSD 2-Clause "Simplified" LicenseBSD-2-Clause

Bonsai Programming Language
^^^^^^^^^^^^^^^^^^^^^^^^^^^
Bonsai is a programming language for transforming trees (such as abstract syntax trees used to implement programming languages). You define rules to match patterns in a tree and transform the matched part of the tree. These rules are applied over and over until no rules match anywhere in the tree.

Trees are represented by indented lists. A node's children are indented 2 more spaces than their parent. The children can be either unordered or ordered. Using a single colon after the node's label denotes that the children are an unordered multiset. Using two colons after the node's label denotes that the children are an ordered list.

# A node with the label Foo and two unordered children with labels Bar and Baz
Foo:
  Bar:
  Baz:

Nodes can either have children or have a value associated with them. Values can be either integers, decimals, or strings.

# A node with the label Foo and the value 7.3
Foo: 7.3

A rule is defined by showing a part of the tree that must be matched and how it should be transformed. There are 3 operators that can precede a node's label: !, +, and -. The ! operator says that if this node matches, then the rule does not match. The + operator says that this node will be created if the rule matches and does not affect whether or not the rule matches. The - operator says that this node must match for this rule to match and that the node will be removed if this rule matches.

The special label ^ denotes the root of the tree.

Implementation Overview
^^^^^^^^^^^^^^^^^^^^^^^

We start with an initial tree and evaluate the tree by applying rules until no more rules match.

There is a queue of nodes to try to apply rules to. Workers can get nodes from the queue and apply rules to the subtree rooted at that node.

When a node is added to the queue, a semaphore is incremented on the node's parent. A node can not be popped from the queue unless its semaphore is at 0.

If any part of the subtree rooted at a popped node can be (and is) transformed, that node is added back to the queue. If not, the semaphore on the popped node's parent is decremented.

When a worker tries to evaluate the subtree rooted at a node, it uses different techniques to check if rules match and to apply transformations.

The worker keeps track of whether each rule is known to match the subtree, known to not match the subtree, or if this is unknown.

Different techniques are used (in parallel) until either a valid transformation of the subtree has been found or it is confirmed that no rules match the subtree.

Techniques can be written purely in native code or can use external dependencies (databases, key value stores, queues, counters, etc).

Techniques can either halt when a worker knows what to do with a subtree or keep running (for JIT purposes).

Different implementations of the same set of rules can be compiled with different sets of techniques.

A technique takes as input:
  - the subtree
  - a mapping of which rules match the subtree, don't match the subtree, or if this is unknown

A technique can return:
  - a JITed version of itself
    - whether the JITed version should replace this technique in the future or if the original should stay as well
  - a valid transformation
  - which rules have been determined to match the subtree
  - which rules have been determined to not match the subtree
  - whether or not this technique should be tried again for the new state of knowledge about which rules match

Techniques should use knowledge about which rules match to return early if possible.

Things to investigate:
  What kind of algorithm should be used to walk around the tree?
  What nodes should start off in the queue? The root node? All the nodes?
  How should the queue be implemented?
    Queue that just re-pushes nodes with a nonzero semaphore:
      might not get a usable node when popping
      might cause brief stalls due to clusters of nodes that can't be used
        if we have a wide, deep tree, we could end up with clusters of shallow nodes that have to be skipped and pushed back to the end
    Priority Queue:
      Based on Semaphore Value (with reordering):
        guaranteed to get usable node when popping
        have to reorder ancestors when adding a node to the queue
          might cause brief stalls if many actively used nodes have common ancestors
      Based on Semaphore Value (without reordering; just reinserts unusable nodes with new semaphore value):
        might not get a usable node when popping
        should end up with pretty even distribution of unusable nodes
      Based on Depth (popping deepest first):
        guaranteed to get usable node when popping
        might cause brief stalls depending on how tree is structured
          for instance, if the AST is shallow and the memory is deep, updating memory could block processing the AST for a while
  If a worker's techniques can not tell it whether or not any rules can be applied to a subtree, how should this be handled?
    Should it crash?
    Should it report the error and continue on as if no rules matched the subtree?
    Should handling this scenario be customizable?

Example: Stable Marriage Problem
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

See http://github.com/bakineggs/stable_marriage for an example of how Bonsai can be used to solve the stable marriage problem (http://en.wikipedia.org/wiki/Stable_marriage_problem).

For Writing Languages
^^^^^^^^^^^^^^^^^^^^^
Bonsai is primarily designed for writing programming languages. By representing the AST, environment, memory, etc. in a single tree, you can define the semantics of a language by writing rules to transform the tree.

Here are some rules that could be part of an interpreter of some language:

# spawn new thread (see see https://gist.github.com/369171 for similar k-framework version)
Thread:              # node must be matched
  K::                # node must be matched, the :: means that children are ordered (default is unordered)
    -Spawn:=         # node must be matched, the - means that this node will be removed, the := means all children must be matched
      *:* S          # makes S a variable that is any number of any kind of node (makes a variable because it's a matched node)
  Env: E
+Thread:             # node not matched against, the + means this node will be added
  K::
    *: S             # references the variable S (because it's inside a new node)
  Env: E

# purge a finished thread
-Thread:
  K:=                # the = means an exact match (not containing any unmatched nodes), so this matches a "K" node with no children

# wait for all threads to finish or rendezvous
Threads:=            # all child nodes must match for this rule to apply
  Thread:*           # the * means that there can be any number of these nodes
    K::
      -Rendezvous:

# wait for another thread to finish
Thread:
  K::
    -Join:
      Integer: I
!Thread:             # this node must not match
  Id: I

# integer addition
-Plus:
  Integer: A
  Integer: B
+Integer: < A + B    # the < means do a computation (with the ability to call user-defined functions) - syntax and use has changed since this example was written

# decimal addition
-Plus:
  Decimal: A
  Decimal: B
+Decimal: < A + B

# mixed addition
-Plus:
  Integer: A
  Decimal: B
+Decimal: < A + B

# remove an argument for evaluation (all args must start inside an "unevaluated" node)
K::
  +*: A
  +Restore:
    Plus:
      -Unevaluated:
        *: A
      +Context:

# reinsert evaluated integer
K::
  -Integer: I
  -Restore:
    Plus:
      -Context:
      +Integer: I

# reinsert evaluated decimal
K::
  -Decimal: A
  -Restore:
    Plus:
      -Context:
      +Decimal: A