/pattern-pangloss

Pattern lets you transform data structures in amazing ways.

Primary LanguageClojureEclipse Public License 1.0EPL-1.0

Pattern

Pattern lets you transform data structures in amazing ways.

The focus is on simplicity and expressivity. It works on immutable persistent data, greatly simplifying development and debugging of transformations.

pangloss/pattern {:git/url "https://github.com/pangloss/pattern"
                  :sha "<use the current commit hash>"}

Video

See my Nov 2022 talk for London Clojurians

How can this be used?

Here are a few examples of how it's been used already:

  • Create simple and clear macros
  • Create an infix math macro in about 30 LOC
  • Define the simplification rules of a computer algebra system (see a very similar engine in use in SICMUtils)
  • Create a Python to Clojure source-to-source converter in under 400 LOC
  • Compile Scheme to X86 assembly in about 1500 LOC

The primary tools in Pattern are the rule and rule combinator. To understand how to use those, it's important to first introduce matcher patterns and substitution.

How does it work?

Pattern is an collection of tools for pattern matching and substitution. Those tools can be extended and combined in flexible ways.

The match pattern looks similar to the data structure it's matching on. That enables a very intuitive understanding of how the pattern will apply to the data being matched.

A transformation is a match pattern with a substitution pattern. Combined, that is called a rule.

The substitution pattern uses exactly the same syntax as the match pattern. Substitution supports all of core functionality of the matcher. That makes it much simpler to understand the behavior of a rule.

In the spirit of Clojure, pattern maintains a high degree of simplicity in the interface.

Pattern Matching

(require '[pattern :refer [compile-pattern]])

The core of Pattern is the pattern matcher, which can be used to match any kind of Clojure data structure. To use the pattern matcher, call (compile-pattern my-pattern). That returns a function that you can call with the data you want to extract matches from. The return value is a map from match names to matching data.

Here is a simple example that matches the first and last elements of a vector if the central elements are 1 2 3.

(def m (compile-pattern '[?before 1 2 3 ?after])

(m [0 1 2 3 4]) ; => {:before 0 :after 4}

Even a simple pattern encodes sophisticated matching logic. For instance you could replicate the above matcher with this code. As you can see, it's more complex, less clear in intention, and easier to make a mistake:

(def m #(when (vector? %)
          (if (and (= 5 (count %))
                (= [1 2 3] (subvec % 1 4)))
            {:before (first %)
             :after (last %)})))

(m [0 1 2 3 4]) ; => {:before 0 :after 4}

The advantages of these pattern matchers become even more apparent as the complexity of pattern increases.

Unification Within a Pattern

If multiple matchers in a pattern have the same name, the values they match must unify.

Unification increases the sophistication of patterns that can be defined.

(def m (compile-pattern '[?fn-name (fn ?fn-name [??args] ??body)]))

(m ['abc '(fn abc [x] x)]) ; => {:fn-name 'abc ...}
(m ['xyz '(fn abc [x] x)]) ; => nil

Unification works across different matcher types. The pattern [?list [??list 3]], could match [[1 2] [1 2 3]] or [[:x] [:x 3]], etc.

Pattern Matchers Available

Above I showed 2 matchers: match-list and match-element matching against a fixed-length vector. Much more flexibility is available.

The pattern could be changed to [??before 1 2 3 ??after], for instance. Now it would match a vector of any length as long as it contains the sequence 1 2 3.

(def m* (compile-pattern '[??before 1 2 3 ??after]))

(m* [1 2 3])       ; => {:before [] :after []}
(m* [0 1 2 3 4 5]) ; => {:before [0] :after [4 5]}

Patterns also do not need to be flat. For instance, I can match against Clojure S-Expressions. Here I'll introduce a three more matcher types:

(def let-bindings (compile-pattern '(let [(?:* (? binding* symbol?) ?_)] ??_)))

(let-bindings '(let [datum (first data)
                     c (count datum)]
                 ...))
;; => {:binding* [datum c]}

?_ and ??_ are special cases of unnamed matchers. These matchers do not unify with each other, and the value matched by them is not returned in the match results. They are useful for ignoring parts of the input data that are not relevant to what you are trying to do. Anywhere you can provide a matcher name, using _ as the name has the same behavior.

The ?:* matcher is match-many. It matches a repeated subsequence within the sequence it is placed in. In this case, that allows it to handle the characteristic flattened Clojure binding pattern.

The (? ...) matcher is the full form of match-element that we saw earlier. In this case, it is equivalent to ?binding*, except that here I gave it a predicate function. The matcher will only succeed if the predicate returns a truthy value when called with the matched value.

That means the following does not match:

(let-bindings '(let [42 (first data)
                     c (count datum)]
                 ...))
;; => nil

There are a lot more matchers and the list is gradually expanding with ever more weird and wonderful behaviors.

Each matcher in the list has a matcher implementation function with detailed documentation in the pattern.matchers namespace.

(better docs coming eventually!)

Matcher Implementation Notes
? match-element Match a single element
?? match-segment Match 0 or more elements
??! Same as ??, but greedy
?:map match-map Match a map with specific keys
?:*map match-*map Match each key-value pair in a map
?:+map match-+map Like ?:*map, but require at least one match
?:as match-as Capture an entire sub pattern if the contained pattern matches
?:? match-optional Match 0 or 1 instance
?:1 match-one Match exactly 1 instance
?:* match-many Match any number of instances
?:+ match-at-least-one Match at least one instance
?:chain match-chain Alternate between patterns and functions on the matched data
??:chain Same as ?:chain but capture sequence data
| match-or Match alternative patterns on the same data
& match-and Match all patterns on the same data
?:= match-literal Match a literal value (usually not needed)
?:not match-not Match only if the contained pattern does not
?:when match-when If the predicate, match the pattern
?:if match-if If the predicate, match the then pattern, otherwise the else pattern
?:letrec match-letrec Create named subpatterns that can be used later
?:ref match-ref Use a named subpattern
?:fresh match-fresh Match a new copy of the given name
?:all-fresh match-all-fresh Don't unify with captures outside the block
?:restartable match-restartable If the pattern doesn't match, raise a condition that can provide a value
?:re-matches match-regex Match with a regex, binding captures
?:re-seq match-regex Multiple matches with a regex, binding captures

Adding new matchers

The ?:chain rule can often be used to create a custom matcher. This is the implementation of the ?:map one:

(defn match-*map
  "Create a ?:*map matcher than can match a key/value pair multiple times."
  [[_ k v] comp-env]
  (compile-pattern*
    (quo `(?:chain
            (? _ ~(some-fn nil? map?))
            seq
            (| nil ((?:* [~k ~v])))))
    comp-env))

(register-matcher '?:*map #'match-*map {:aliases ['?:map*]})

Substitution

The core substitution function is sub. It is actually a macro that expands to be equivalent to just writing the substitution pattern as a literal value.

(let [x 2
      y '[a b c]]
  (sub (* (+ ??y) ?x)))
;; => (* (+ a b c) 2)

Macroexpanded, you can see that the substitution overhead is effectively zero:

(let [x 2
      y '[a b c]] 
  (list '* (list* '+ y) x))

You may notice that instead of passing in a map of values to sub, it effectively looks in the current evaluation context for replacement values. This tends to be very convenient. In practice, 99% of the time I use this method.

For that last 1%, though, a more traditional data-driven substitution method is required. For that, we can use substitute, which interprets and executes the pattern at runtime:

(substitute 
  '(* (+ ??y) ?x)
  {'x 2 'y '[a b c]})
;; => (* (+ a b c) 2)

Rules

Rules are pattern matchers tied to function bodies. The captured data from the matcher becomes the function arguments.

Most frequently the function body is modifying data and thu sub macro is the perfect tool, but any value can be returned by a matching rule.

(rule mult-id
  '(* 1 ?x)
  x)
  
(rule strength-reduce
  '(* 2 ?x)
  (sub (+ ?x ?x)))

Rules are implemented in terms of standard matchers, so any of the above matchers may be used.

A rule or rule combinator can be called like a function.

(def my-rule (rule '(* 2 ?x) (sub (+ ?x ?x))))

(my-rule '(* 2 5)) # => (+ 5 5)

Rule Combinators

Rule combinators allow multiple rules to be used together. Rule combinators can themselves be combined. A rule combinator is itself just a rule. Rules and rule combinators are interchangeable.

The following combinators are documented for now in the pattern.r3.combinators namespace:

Rule combinator Note
rule-list Search the rules from top to bottom. Stop after successfully matching.
in-order Search the rules from top to bottom. Continue with the result of each matching rule.
on-subexpressions Try the given rule on every element in the datastructure in post-walk order
iterated Keep applying the rule until a fixed point is reached
simplifier Like on-subexpressions which iterates at every level
directed Guide the traversal either using the pattern or from within the rule body

Compilation Dialects and Tools

...

Acknowledgements

Pattern is based upon the core ideas described in the excellent book Software Design for Flexibilty by Chris Hanson and Gerald Jay Sussman.

The compilation tools aspect is heavily inspired by the Nanopass Framework which introduces the idea of dialects and was the inspiration for some of the more powerful rule combinators that this library introduces.

License

Copyright © 2022 Darrick Wiebe

Distributed under the Eclipse Public License version 1.0.