Creating an elm/parser
as an answer to a Slack question.
On Wednesday 26th of December 2018 ajgreenb asked the following question on Elm slack #general channel.
hello! i have a
List String
in which all entries look like one of2293487
,10.128.16.255
,192.168.1.2/32
. that is, it's either a string of just digits; four sets of 1-3 digits each separated by a.
; or the same followed by a/
and 1-2 digits. i want to map each list item according to whether it is an id or an ip address (the second two formats would ideally be treated identically.)initially i thought to use the
Regex
package, but theRegex
package recommended looking atelm/parser
. i can't seem to make that do what i want, though. i'm trying to be able to do something liketoA : String -> A toA s = case <something> s of <matchID> id -> ID id <matchIPAddress> ipAddr -> IPAddress ipAddr``` and then i could `List.map toA [ "2293487", "10.128.16.255", "192.168.1.2/32" ]`. does what i'm trying to do make sense? and does anyone have a suggestion for a good way to do that?
This repository contains Elm code that shows how a elm/parser
can be used to solve ajgreenb problem. Furthermore, this README describes the rational behind some of the decisions.
The starting point will be a skeletal Elm project created by running elm init
and elm-test init
.
This walk through is meant to provide an example of elm/parser
but it is expected that you are at least familiar with the documentation.
Add the dependency to elm/parser
is a good starting point.
elm install elm/parser
The next part is to model the data we want to end up with. The question provides a suggestion, but is does not expose the internal structure of the data. In order to provide some insight in how to parse rich data structures, we our modeling our data as follows.
module Data exposing (Data(..))
type Data
= Identifier String
| IpAddress IpAddressData
type alias IpAddressData =
{ networkID1 : Int
, networkID2 : Int
, hostID1 : Int
, hostID2 : Int
, subnetMask : Maybe Int
}
Next we create a parse
function and setup a test to check our intended API in a test. Because parsing can fail, we will need to return a Result
.
parse : String -> Result String Data
parse input =
Err "not yet implemented"
From the examples provided in the question we can extract the following tests. Note that the tests will fail at the moment.
module ParserTest exposing (suite)
import Data exposing (Data(..))
import Expect exposing (Expectation)
import Test exposing (..)
suite : Test
suite =
describe "Data"
[ describe "parse"
[ test "parse identifier" <|
\_ ->
let
input =
"2293487"
actual =
Data.parse input
expected =
Ok <| Identifier input
in
actual
|> Expect.equal expected
, test "parse IP address" <|
\_ ->
let
input =
"10.128.16.255"
actual =
Data.parse input
expected =
Ok <|
IpAddress
{ networkID1 = 10
, networkID2 = 128
, hostID1 = 16
, hostID2 = 255
, subnetMask = Nothing
}
in
actual
|> Expect.equal expected
, test "parse IP address with a subnet mask" <|
\_ ->
let
input =
"10.128.16.255/32"
actual =
Data.parse input
expected =
Ok <|
IpAddress
{ networkID1 = 10
, networkID2 = 128
, hostID1 = 16
, hostID2 = 255
, subnetMask = Just 32
}
in
actual
|> Expect.equal expected
]
]
When working with elm/parser
it is nice to freely use all the functionality. Therefore we expose all the bindings in the Parser
name space.
import Parser exposing (..)
The beauty of elm/parser
is that it allows you to focus on a single part and combine them later on. For now we will focus on a parser for the Identifier
.
We will start with the signature of the identifier
function. This function is a Parser Data
that will parse an Identifier
identifier : Parser Data
To implement the parser we take a look at the getChompedString
function. It takes a parser and returns the String that this parser consumed. It works with the family of chomp...
functions like chompIf
, chompWhile
etcetera.
We are going to look for the chompWhile: (Char -> Bool) -> Parser ()
function. From the documentation
Chomp zero or more characters if they pass the test. This is commonly useful for chomping whitespace or variable names
We want to chomp digits for which we can use the Char.isDigit
function.
The getChompedString
returns a String
and we want Data
we can use the Parser.map
function to transform the parsed String
into an Identifier
.
identifier : Parser Data
identifier =
chompWhile Char.isDigit
|> getChompedString
|> map Identifier
We now can use the identifier
parser to make some of our tests pass. elm/parser
provides a run
function that accepts a Parser a
some input to parse and returns a Result (List DeadEnd) a
.
DeadEnd
is a description of why a parser can get stuck. There is a deadEndsToString
function but unfortunately that is not implemented sensible. We are going to use it anyway, and come back to it later.
parse : String -> Result String Data
parse input =
let
parser =
identifier
in
input
|> run parser
|> Result.mapError deadEndsToString
We bound the identifier
parser to parser
in a let
-block so that we can change the parser easily later on.
This code now passes one of our tests.
With the identifier
under our belt, we continue with the IpAddress
. The signature is similar
ipAddress : Parser Data
Now we will work in a top-down fashion. We will liberally dream up function that we will define until we find primitives that fit the bill.
So below we define the ipAddress
parser in terms of parsers we wish we had.
ipAddress : Parser Data
ipAddress =
succeed IpAddressData
|= network
|. dot
|= network
|. dot
|= host
|. dot
|= host
|= optionalSubnetMask
|> map IpAddress
Newly introduces parsers are network
, dot
, host
, and optionalSubnetMask
. Here are their signatures.
network: Parser Int
dot: Parser ()
host: Parser Int
optionalSubnetMask: Parser (Maybe Int)
Let's focus on network
for the moment. The network part is basically a sequence of digits with a length of one, two or three. We already know how to parse a sequence of digits. Once we have parsed the digits we want to succeed or fail depending on the number of digits we parsed.
network : Parser Int
network =
let
toInt input =
input
|> String.toInt
|> MaybeWithDefaul -1
in
chompWhile Char.isDigit
|> getChompedString
|> andThen (lengthWithIn 1 3)
|> map toInt
Here you see the use of andThen
. Its signature (a -> Parser b) -> Parser a -> Parser b
allows you to return a parser depending on the result of an other parser. We use it to succeed or fail depending on the length of the parsed digits.
lengthWithIn : Int -> Int -> String -> Parser String
lengthWithIn minimum maximum input =
let
n =
String.length input
in
if minimum <= n && n <= maximum then
succeed input
else
problem <|
"expecting input to be between "
++ String.fromInt minimum
++ " and "
++ String.fromInt maximum
Here the succeed
return the input and problem
signals the failure.
host
is exactly alike to network
. One could create a single function and alias network
and host
to it.
The dot
parser parses '.'
. The elm/parser
package exposes symbol
for this situation.
dot : Parser ()
dot =
symbol "."
Arguably optionalSubnetMask
is the most interesting. For this we will first focus on assuming the subnet-mask is always present. For this we create a subnetMask
parser.
subnetMask : Parser Int
subnetMask =
let
toInt input =
input
|> String.dropLeft 1
|> String.toInt
|> Maybe.withDefault -1
in
succeed ()
|> chompIf '/'
|> chompWhile Char.isDigit
|> getChompedString
|> map toInt
Nothing new is presented here. We first chomp if the character is '/'
and then chomp a sequence of digits. Because we also had chomped a '/'
we need to drop that character when we convert it to a integer.
Now we are going to write an higher order function that will accept a Parser a
and returns a Parser (Maybe a)
. This allows us to make any parser optionally.
optionally : Parser a -> Parser (Maybe a)
optionally parser =
oneOf
[ parser |> map Just
, succeed Nothing
]
oneOf is a parser
will keep trying parsers until oneOf them starts chomping characters.
So we take our parser
and create a new parser that wraps the result of parser
in a Just
. If that does not succeed, we accept Nothing
.
optionalSubnetMask
can now be implemented.
optionalSubnetMask : Parser (Maybe Int)
optionalSubnetMask =
optionally subnetMask
oneOf
can now also be used to implement our parse
function. Instead of the parser = identifier
in the let block, we should used
parser =
oneOf
[ backtrackable ipAddress
, identifier
]
The most notable new concept is backtrackable
. It is needed because oneOf will otherwise not choose a different path. Both ipAddress and identifier start with a sequence of digits, so oneOf
will chomp characters in both cases. With backtrackable we allow oneOf
to pick the alternate path.
With this definition the tests pass. The entire parser is given below.
module Data exposing (Data(..), parse)
import Parser exposing (..)
type Data
= Identifier String
| IpAddress IpAddressData
type alias IpAddressData =
{ networkID1 : Int
, networkID2 : Int
, hostID1 : Int
, hostID2 : Int
, subnetMask : Maybe Int
}
parse : String -> Result String Data
parse input =
let
parser =
oneOf
[ backtrackable ipAddress
, identifier
]
in
input
|> run parser
|> Result.mapError deadEndsToString
identifier : Parser Data
identifier =
chompWhile Char.isDigit
|> getChompedString
|> map Identifier
ipAddress : Parser Data
ipAddress =
succeed IpAddressData
|= network
|. dot
|= network
|. dot
|= host
|. dot
|= host
|= optionalSubnetMask
|> map IpAddress
network : Parser Int
network =
let
toInt input =
input
|> String.toInt
|> Maybe.withDefault -1
in
chompWhile Char.isDigit
|> getChompedString
|> andThen (lengthWithIn 1 3)
|> map toInt
lengthWithIn : Int -> Int -> String -> Parser String
lengthWithIn minimum maximum input =
let
n =
String.length input
in
if minimum <= n && n <= maximum then
succeed input
else
problem <|
"expecting input to be between "
++ String.fromInt minimum
++ " and "
++ String.fromInt maximum
host : Parser Int
host =
network
dot : Parser ()
dot =
symbol "."
subnetMask : Parser Int
subnetMask =
let
toInt input =
input
|> String.dropLeft 1
|> String.toInt
|> Maybe.withDefault -1
in
(succeed ()
|. chompIf (\c -> c == '/')
|. chompWhile Char.isDigit
)
|> getChompedString
|> map toInt
optionally : Parser a -> Parser (Maybe a)
optionally parser =
oneOf
[ parser |> map Just
, succeed Nothing
]
optionalSubnetMask : Parser (Maybe Int)
optionalSubnetMask =
optionally subnetMask
At the moment our parse
function would happily only parse a part of the input. For example if one would try to parse "10a.2bc.3#!.19"
the result would be Ok (Identifier "10")
as can be seen by the following repl session.
> import Data exposing (parse)
> parse "10a.2bc.3#!.19"
Ok (Identifier "10") : Result String Data.Data
We can remedy that by using the end
parser.
In order not to disturb other test we will expose a parseComplete
function.
parseComplete : String -> Result String Data
parseComplete input =
let
incompleteParser =
oneOf
[ backtrackable ipAddress
, identifier
]
parser =
succeed identity
|= incompleteParser
|. end
in
input
|> run parser
|> Result.mapError deadEndsToString
which will pass the following test
, test "parsing \"10a.2bc.3#!.19\" with `parseComplete` should fail" <|
\_ ->
let
input =
"10a.2bc.3#!.19"
actual =
Data.parseComplete input
expected =
Err "TODO deadEndsToString"
in
actual
|> Expect.equal expected
We made some choices that could have been made differently. Below we summarize them.
- Instead of using
getChompedString
we could have usedint
. - Instead of using
getChompedString
and the generalParser.map
we could have usedmapChompedString
. - We haven't done error reporting.