A grep-like command line utility to search for s-expressions matching some given patterns. Patterns are wildcards that match any s-expression, atom wildcards that match any atom, literals that match literal atoms, regular expressions that match literals matching the regular expression, conjunctions of patterns, disjunctions of patterns, and flat and depth repetitions allowing to pattern match at particular depth levels in an s-expression and at particular flat positions.
The flavor of s-expressions that Smatch parses is the SMTLIB v2.6 flavor which is described in the manual (PDF).
Smatch is pronounced like smash.
Problem: Find the s-expressions that contain calls to the eq?
function whose arguments contains
true
or false
.
The pattern to match these expressions is (@depth (@* (eq? (@* @_) (@or true false) (@* @_))))
.
The syntax of the patterns is described later.
If the files to search in are a.lisp
and b.lisp
the following command lists the top-level
s-expressions that match:
smatch '(@depth (@* (eq? (@* @_) (@or true false) (@* @_))))' a.lisp b.lisp
If the files live in some folder then the recursive flag -r
can be used
smatch '(@depth (@* (eq? (@* @_) (@or true false) (@* @_))))' -r files/
If the pattern is too long and exists in a file (see smatch-syntax.lisp
) then it can be passed
with the -f
flag
smatch -f pattern.lisp -r files/
The -c
flag counts the number of matches in each matched file, and -n
shows the position of
match in each file:
> smatch '(@depth (@* hello))' -r files
files/test.lisp:4:(define-fun hello ; this is the hello function
((a bool) (b string))
(todo))
files/test.lisp:14:(print (hello #t "[debug]"))
If you're matching against a file which smatch
cannot parse then the -i
flag will report all
matches until the error and continue to the next file without panicing.
A statically linked binary for x86_64 is always provided in the releases.
If you prefer to build and install it on your own you can run:
cargo install --git https://github.com/geezee/smatch
To build your own copy of smatch run the following command.
cargo build --release
It will first install and build smatch's two dependencies: regex and clap. This is a one-time operation unless you have chosen to update the locked version of either.
The resulting binary is in target/release/smatch
.
For more examples look in test.sh
.
The smatch pattern syntax can be described in itself!
Have a look at smatch-syntax.lisp
to see how smatch patterns look within themselves.
This file is useful in tests.
The last two tests that ./test.sh
runs are:
- It checks that
smatch-syntax.lisp
pattern matches withsmatch-syntax.lisp
- It gathers all the patterns in
./test.sh
and checks thatsmatch-syntax.lisp
matches all of them
The wildcard @_
matches against any S-expr
printf "hello \n world" | smatch @_ # 2 matches
printf "(hello world)" | smatch @_ # 1 match
printf "()" | smatch @_ # 1 match
The atom wildcard @atom
matches only against atoms
printf "hello \n world \n" | smatch @atom # 2 matches
printf "()\n" | smatch @atom # 0 matches
printf "(hello world)" | smatch @atom # 0 matches
A literal atom matches only against the same literal
printf "hello \n world \n" | smatch hello # 1 match
printf "world\n" | smatch hello # 0 match
printf "(hello world)" | smatch hello # 0 match
An identifier can be matched against with a regular expression but making that regular expression a
string and surrounding it with @re
printf "declare-fun \n set-info \n declare-sort" | smatch '(@re "^declare-.*")' # 2 matchs
A list matches a list, if no repeats are used then the list pattern matches a list term if and only if their length match and the members match pairwise.
printf "(declare-fun hello (U U) U) \
(set-info :status sat) \
(declare-sort A 0)" \
| smatch '((@re "^declare-.*") @atom @_ @atom)' # 1 match
The @and
pattern builder takes at least one pattern and matches a term if and only if the term
matches all the provided patterns
printf "hello \n" | smatch "(@and @atom hello)" # 1 match
printf "world \n" | smatch "(@and @atom hello)" # 0 match
The @or
pattern builder takes at least one pattern and matches a term if and only if the term
matches at least one of the provided patterns
printf "hello \n" | smatch "(@or @atom hello)" # 1 match
printf "world \n" | smatch "(@or @atom hello)" # 1 match
printf "() \n" | smatch "(@or @atom hello)" # 0 match
Repeats look like this:
(@* <pattern>) # 0 or more matches of pattern
(@? <pattern>) # 0 or one matches of pattern
(@+ <pattern>) # one or more matches of pattern
(@less <n> <pattern>) # less than n (inclusive) matches of pattern
(@more <n> <pattern>) # more than n (inclusive) matches of pattern
(@between <n> <m> <pattern>) # between n (inclusive) and m (inclusive) matches of pattern
Here's some examples:
# all these match
printf "()" | smatch '(@* @atom)'
printf "hello" | smatch '(@* @atom)'
printf "(hello)" | smatch '(@* @atom)'
printf "(hello world)" | smatch '(@* @atom)'
printf "(hello world !)" | smatch '(@* @atom)'
# this one matches too
printf "(declare-fun foo ((a nat) (b nat)) bool)" \
| smatch '(declare-fun @atom ( (@* (@atom @atom)) ) @atom)'
All patterns start matching at the first level of a given term.
It's possible to search at any depth using the repeats notation by surrounding it with a @depth
.
Here's examples that match:
printf "hello" | smatch '(@depth (@* hello))'
printf "(assert (= (f hello) world))" | smatch '(@depth (@* hello))'
printf "(assert (= (f hello) world))" | smatch '(@depth (@between 2 4 hello))'
Patterns can be given names and reused later.
Pattern names are of the shape @<name>
with the obvious exception that atom
and _
being illegal names.
A simple example is the following pattern:
(@let (@t true) (@f false)
(@or @t @f))
which is equivalent to (@or true false)
.
Patterns can be nested and the usual lexical scoping rules apply, so:
(@let (@t true)
(@or @t (@let (@t false)
@t)))
is still equivalent to (@or true false)
Patterns can also be recursive which allows for cool patterns to be defined. Just beware infinite recursion and the order of your variables! Patterns are evaluated left to right and are greedy (they try to match as much as possible first before backtracking).
For example, the following pattern:
(@let (@bexpr (@or @atom (not @bexpr) (or (@+ @bexpr)) (and (@+ @bexpr))))
@bexpr)
matches the following
(or x (and (not y) z (or x z)) t (not r))
but does not match the following expressions:
(or x (and (not y) z (or x z)) t (not r) (=> x y))
(or x (and (not y) z (or x z)) t (not r) (and))
(assert (or x (and (not y) z (or x z)) t (not r)))
The tool is meant to do one thing and one thing alone:
fundamentally there's a pattern matching algorithm and everything around it is boilerplate which
should be kept at a minimum.
Thus smatch is offered in a single file smatch.rs
.
The tests are also provided in a single bash script that also serves as a list of examples to better understand what patterns are designed to match.
Ideally the tests would be run with cargo test
.
Sadly this requires giving names to the 120+ tests and examples in test.sh
which is a silly thing
to spend time on.