scran is a parser combinator library heavily influenced by nom.
Unlike nom, scran isn't primarily intended for binary parsing because we already have bit syntax expressions in Erlang. However, it does have a small number of combinators for length encoded, or null terminated byte sequences (typically "strings"), big/little integers with common bit sizes.
Internally, scran uses the maybe keyword, which is supported by OTP 25+.
Each parser returns a function that takes Unicode input, returning a tuple of the unmatched and matched input.
1> (scran_character_complete:tag("Hello"))("Hello, World!").
{", World!","Hello"}
Where "Hello"
has been matched by tag("Hello")
and `", World!"' is
the remaining input.
2> (scran_character_complete:tag("Hello"))("hello, world!").
nomatch
The tag("Hello")
is case sensitive so there is nomatch
for "hello,
world!".
3> (scran_character_complete:tag_no_case("Hello"))("hello, world!").
{", world!","hello"}
4> (scran_character_complete:tag_no_case("Hello"))("hellO, wOrlD!").
{", wOrlD!","hellO"}
tag_no_case
can be used for case insensitive matching.
The scran_character_complete
module has various parsers used match
different character classes or combinations.
The alpha0
parser will match zero or more alphabetic characters
in the range of [a-zA-Z]
.
12> (scran_character_complete:alpha0())("abc").
{[], "abc"}
13> (scran_character_complete:alpha0())("").
{[], []}
14> (scran_character_complete:alpha0())("abc123").
{"123", "abc"}
15> (scran_character_complete:alpha0())("123abc").
{"123abc", []}
The alpha1
parser will match one or more alphabetic characters
in the range of [a-zA-Z]
.
8> (scran_character_complete:alpha1())("abc").
{[], "abc"}
9> (scran_character_complete:alpha1())("").
nomatch
10> (scran_character_complete:alpha1())("abc123").
{"123","abc"}
11> (scran_character_complete:alpha1())("123abc").
nomatch
The alphanumeric0
parser will match zero or more alpha numeric
characters in the range of [a-zA-Z0-9]
.
16> (scran_character_complete:alphanumeric0())("abc").
{[], "abc"}
17> (scran_character_complete:alphanumeric0())("").
{[], []}
18> (scran_character_complete:alphanumeric0())("abc123").
{[], "abc123"}
19> (scran_character_complete:alphanumeric0())("123abc").
{[], "123abc"}
20> (scran_character_complete:alphanumeric0())("123abc!%^").
{"!%^","123abc"}
21> (scran_character_complete:alphanumeric0())("!%^abc123").
{"!%^abc123", []}
The alphanumeric1
parser will match one or more alpha numeric
characters in the range of [a-zA-Z0-9]
.
24> (scran_character_complete:alphanumeric1())("abc").
{[], "abc"}
25> (scran_character_complete:alphanumeric1())("").
nomatch
26> (scran_character_complete:alphanumeric1())("abc123").
{[], "abc123"}
27> (scran_character_complete:alphanumeric1())("123abc").
{[], "123abc"}
28> (scran_character_complete:alphanumeric1())("123abc!%^").
{"!%^", "123abc"}
29> (scran_character_complete:alphanumeric1())("!%^").
nomatch
30> (scran_character_complete:alphanumeric1())("!%^abc123").
nomatch
The digit0
parser will match zero or more numeric
characters in the range of [0-9]
.
The digit1
parser will match one or more numeric
characters in the range of [0-9]
.
The multispace0
parser will match zero or more numeric
characters in the range of [\\s\\t\\n\\r]
.
The multispace1
parser will match one or more numeric
characters in the range of [\\s\\t\\n\\r]
.
The none_of
parser will match if the next character of the input is
none of the supplied characters.
The one_of
parser will match if the next character of the input is
one of the supplied characters.
The re
parser will match if the input satisfies the supplied case
sensitive regular expression.
The re\_no\_case
parser will match if the input satisfies the supplied case
insensitive regular expression.
The tag
parser will match if the input matches the supplied case
sensitive string.
The tag\_no\_case
parser will match if the input matches the supplied case
insensitive string.
The take
parser will match if it can take the specified number of
characters from the input.
The scan_branch
module is used to specify different branches of
parsing behaviour.
With scran_branch:alt/1
you can specify alternate parsers to
use. Each parser is tried in turn until nomatch
is returned.
1> (scran_branch:alt([scran_character_complete:alpha1(),
scran_character_complete:digit1()]))("abc123").
{"123","abc"}
2> (scran_branch:alt([scran_character_complete:alpha1(),
scran_character_complete:digit1()]))("123456").
{[],"123456"}
3> (scran_branch:alt([scran_character_complete:alpha1(),
scran_character_complete:digit1()]))("123456abc").
{"abc","123456"}
4> (scran_branch:alt([scran_character_complete:alpha1(),
scran_character_complete:digit1()]))("!@£$").
nomatch
The scan_combinator
module is used to specify parsers that are
combined into ones that can exhibit complex behaviours.
This parser succeeds if all the input has been consumed by its child parser.
This parser maps a function on the result of a parser.
This parser ignores the result of a parser.
This parser applies a parser over the result of another one.
An optional parser, will return none if the option is not taken.
This parser returns the provided value if the child parser succeeds.
This parser calls the supplied parser if the condition is met.
This parser tries to apply its parser without consuming the input.
This parser returns its input if it is at the end of input data.
This parser succeeds if the child parser returns an error.
The test cases are currently the best place to look at simple examples of the combinators. There is also a more complex example that is used to parse part of the PostgreSQL grammar.
Coverage report is available here.
edoc is available here.