Pattern lets you transform data structures in amazing ways.
The focus is on simplicity and expressivity. It works on immutable persistent data, greatly simplifying development and debugging of transformations.
pangloss/pattern {:git/url "https://github.com/pangloss/pattern"
:sha "<use the current commit hash>"}
See my Nov 2022 talk for London Clojurians
Here are a few examples of how it's been used already:
- Create simple and clear macros
- Create an infix math macro in about 30 LOC
- Define the simplification rules of a computer algebra system (see a very similar engine in use in SICMUtils)
- Create a Python to Clojure source-to-source converter in under 400 LOC
- Compile Scheme to X86 assembly in about 1500 LOC
The primary tools in Pattern are the rule and rule combinator. To understand how to use those, it's important to first introduce matcher patterns and substitution.
Pattern is an collection of tools for pattern matching and substitution. Those tools can be extended and combined in flexible ways.
The match pattern looks similar to the data structure it's matching on. That enables a very intuitive understanding of how the pattern will apply to the data being matched.
A transformation is a match pattern with a substitution pattern. Combined, that is called a rule.
The substitution pattern uses exactly the same syntax as the match pattern. Substitution supports all of core functionality of the matcher. That makes it much simpler to understand the behavior of a rule.
In the spirit of Clojure, pattern maintains a high degree of simplicity in the interface.
(require '[pattern :refer [compile-pattern]])
The core of Pattern is the pattern matcher, which can be used to match any kind of Clojure data structure.
To use the pattern matcher, call (compile-pattern my-pattern)
.
That returns a function that you can call with the data you want to extract matches from.
The return value is a map from match names to matching data.
Here is a simple example that matches the first and last elements of a vector if the central elements are 1 2 3
.
(def m (compile-pattern '[?before 1 2 3 ?after])
(m [0 1 2 3 4]) ; => {:before 0 :after 4}
Even a simple pattern encodes sophisticated matching logic. For instance you could replicate the above matcher with this code. As you can see, it's more complex, less clear in intention, and easier to make a mistake:
(def m #(when (vector? %)
(if (and (= 5 (count %))
(= [1 2 3] (subvec % 1 4)))
{:before (first %)
:after (last %)})))
(m [0 1 2 3 4]) ; => {:before 0 :after 4}
The advantages of these pattern matchers become even more apparent as the complexity of pattern increases.
If multiple matchers in a pattern have the same name, the values they match must unify.
Unification increases the sophistication of patterns that can be defined.
(def m (compile-pattern '[?fn-name (fn ?fn-name [??args] ??body)]))
(m ['abc '(fn abc [x] x)]) ; => {:fn-name 'abc ...}
(m ['xyz '(fn abc [x] x)]) ; => nil
Unification works across different matcher types.
The pattern [?list [??list 3]]
, could match [[1 2] [1 2 3]]
or [[:x] [:x 3]]
, etc.
Above I showed 2 matchers: match-list
and match-element
matching against a fixed-length vector.
Much more flexibility is available.
The pattern could be changed to [??before 1 2 3 ??after]
, for instance.
Now it would match a vector of any length as long as it contains the sequence 1 2 3
.
(def m* (compile-pattern '[??before 1 2 3 ??after]))
(m* [1 2 3]) ; => {:before [] :after []}
(m* [0 1 2 3 4 5]) ; => {:before [0] :after [4 5]}
Patterns also do not need to be flat. For instance, I can match against Clojure S-Expressions. Here I'll introduce a three more matcher types:
(def let-bindings (compile-pattern '(let [(?:* (? binding* symbol?) ?_)] ??_)))
(let-bindings '(let [datum (first data)
c (count datum)]
...))
;; => {:binding* [datum c]}
?_
and ??_
are special cases of unnamed matchers.
These matchers do not unify with each other, and the value matched by them is not returned in the match results.
They are useful for ignoring parts of the input data that are not relevant to what you are trying to do.
Anywhere you can provide a matcher name, using _
as the name has the same behavior.
The ?:*
matcher is match-many
.
It matches a repeated subsequence within the sequence it is placed in.
In this case, that allows it to handle the characteristic flattened Clojure binding pattern.
The (? ...)
matcher is the full form of match-element
that we saw earlier.
In this case, it is equivalent to ?binding*
, except that here I gave it a predicate function.
The matcher will only succeed if the predicate returns a truthy value when called with the matched value.
That means the following does not match:
(let-bindings '(let [42 (first data)
c (count datum)]
...))
;; => nil
There are a lot more matchers and the list is gradually expanding with ever more weird and wonderful behaviors.
Each matcher in the list has a matcher implementation function with detailed documentation in the pattern.matchers namespace.
(better docs coming eventually!)
Matcher | Implementation | Notes |
---|---|---|
? |
match-element | Match a single element |
?? |
match-segment | Match 0 or more elements |
??! |
Same as ?? , but greedy |
|
?:map |
match-map | Match a map with specific keys |
?:*map |
match-*map | Match each key-value pair in a map |
?:+map |
match-+map | Like ?:*map , but require at least one match |
?:as |
match-as | Capture an entire sub pattern if the contained pattern matches |
?:? |
match-optional | Match 0 or 1 instance |
?:1 |
match-one | Match exactly 1 instance |
?:* |
match-many | Match any number of instances |
?:+ |
match-at-least-one | Match at least one instance |
?:chain |
match-chain | Alternate between patterns and functions on the matched data |
??:chain |
Same as ?:chain but capture sequence data |
|
| |
match-or | Match alternative patterns on the same data |
& |
match-and | Match all patterns on the same data |
?:= |
match-literal | Match a literal value (usually not needed) |
?:not |
match-not | Match only if the contained pattern does not |
?:when |
match-when | If the predicate, match the pattern |
?:if |
match-if | If the predicate, match the then pattern, otherwise the else pattern |
?:letrec |
match-letrec | Create named subpatterns that can be used later |
?:ref |
match-ref | Use a named subpattern |
?:fresh |
match-fresh | Match a new copy of the given name |
?:all-fresh |
match-all-fresh | Don't unify with captures outside the block |
?:restartable |
match-restartable | If the pattern doesn't match, raise a condition that can provide a value |
?:re-matches |
match-regex | Match with a regex, binding captures |
?:re-seq |
match-regex | Multiple matches with a regex, binding captures |
The ?:chain rule can often be used to create a custom matcher. This is the implementation of the ?:map one:
(defn match-*map
"Create a ?:*map matcher than can match a key/value pair multiple times."
[[_ k v] comp-env]
(compile-pattern*
(quo `(?:chain
(? _ ~(some-fn nil? map?))
seq
(| nil ((?:* [~k ~v])))))
comp-env))
(register-matcher '?:*map #'match-*map {:aliases ['?:map*]})
The core substitution function is sub
.
It is actually a macro that expands to be equivalent to just writing the substitution pattern as a literal value.
(let [x 2
y '[a b c]]
(sub (* (+ ??y) ?x)))
;; => (* (+ a b c) 2)
Macroexpanded, you can see that the substitution overhead is effectively zero:
(let [x 2
y '[a b c]]
(list '* (list* '+ y) x))
You may notice that instead of passing in a map of values to sub
, it effectively looks in the current evaluation context for replacement values.
This tends to be very convenient.
In practice, 99% of the time I use this method.
For that last 1%, though, a more traditional data-driven substitution method is required.
For that, we can use substitute
, which interprets and executes the pattern at runtime:
(substitute
'(* (+ ??y) ?x)
{'x 2 'y '[a b c]})
;; => (* (+ a b c) 2)
Rules are pattern matchers tied to function bodies. The captured data from the matcher becomes the function arguments.
Most frequently the function body is modifying data and thu sub
macro is the perfect tool, but any value can be returned by a matching rule.
(rule mult-id
'(* 1 ?x)
x)
(rule strength-reduce
'(* 2 ?x)
(sub (+ ?x ?x)))
Rules are implemented in terms of standard matchers, so any of the above matchers may be used.
A rule or rule combinator can be called like a function.
(def my-rule (rule '(* 2 ?x) (sub (+ ?x ?x))))
(my-rule '(* 2 5)) # => (+ 5 5)
Rule combinators allow multiple rules to be used together. Rule combinators can themselves be combined. A rule combinator is itself just a rule. Rules and rule combinators are interchangeable.
The following combinators are documented for now in the pattern.r3.combinators namespace:
Rule combinator | Note |
---|---|
rule-list | Search the rules from top to bottom. Stop after successfully matching. |
in-order | Search the rules from top to bottom. Continue with the result of each matching rule. |
on-subexpressions | Try the given rule on every element in the datastructure in post-walk order |
iterated | Keep applying the rule until a fixed point is reached |
simplifier | Like on-subexpressions which iterates at every level |
directed | Guide the traversal either using the pattern or from within the rule body |
...
Pattern is based upon the core ideas described in the excellent book Software Design for Flexibilty by Chris Hanson and Gerald Jay Sussman.
The compilation tools aspect is heavily inspired by the Nanopass Framework which introduces the idea of dialects and was the inspiration for some of the more powerful rule combinators that this library introduces.
Copyright © 2022 Darrick Wiebe
Distributed under the Eclipse Public License version 1.0.