Could we have a regex combinator in Text.Parsing.StringParser.String?
Closed this issue · 12 comments
regex :: String -> Parser String
such that parsing "aaaaab" using (regex "a+") gives Right "aaaaa"
I have a prototype implementation - It's a little inelegant but seems to do the trick. I wonder if you'd accept a PR along these lines, and if so, how you'd prefer it to be packaged?
Maybe paste the code here then, and we can review before you go to the trouble of creating a PR?
module ParserExtra (regex) where
import Data.Either (Either(..))
import Data.Maybe (Maybe(..), fromMaybe)
import Data.String (drop, length)
import Data.String.Regex as Regex
import Data.String.Regex.Flags (noFlags)
import Data.String.Utils (startsWith)
import Prelude ((<>), (+), ($), show)
import Text.Parsing.StringParser (Parser(..), ParseError(..), fail)
import Data.Array (take)
regex :: String -> Parser String
regex pat =
let
pattern =
if startsWith "^" pat then
pat
else
"^" <> pat
er = Regex.regex pattern noFlags
in
case er of
Left _ ->
fail $ "Illegal regex " <> show pat
Right r ->
Parser \{ str, pos } ->
let
remainder = drop pos str
in
-- reduce the possible array of matches to 0 or 1 elements to aid Array pattern matching
case take 1 $ fromMaybe [] $ Regex.match r remainder of
[ Just matched ] ->
Right { result: matched, suffix: { str, pos: pos + length matched } }
_ ->
let
msg = "Regex pattern " <> show pat <> " did not match"
in
Left { pos, error: ParseError msg }
A few notes:
- Maybe use a
whereclause instead oflet? - Take a
Regexas an argument instead of aString, then you don't need to handle the error case, and it can be precompiled. - Instead of
take 1, you could useuncons.
Otherwise, looks great!
That was quick! Many thanks for the advice. I'm happy about the first and last suggestion, and I can see the point of recompiling, but I'm just a little uneasy about it being used inappropriately if we do this. A user could provide a legitimate pattern but one that was not constrained to match the very first character in the target text. Might this lead to confusion?
Well, we could look at the match, and make sure it matched at position zero, or fail, perhaps.
Ah - that's a good idea - i didn't think of that. I'll experiment a little and post another attempt here later on when I've played with it. Thanks for taking the time to look at it.
OK - I have the next iteration. I don't think it's possible to change the remaining let to where but perhaps I'm wrong:
module ParserExtra1 (regex) where
import Data.Either (Either(..))
import Data.Maybe (Maybe(..), fromMaybe)
import Data.String (drop, length)
import Data.String.Regex as Regex
import Data.String.Utils (startsWith)
import Prelude ((+), ($))
import Text.Parsing.StringParser (Parser(..), ParseError(..))
import Data.Array (uncons)
-- | Match the regular expression
regex :: Regex.Regex -> Parser String
regex r =
Parser \{ str, pos } ->
let
remainder = drop pos str
in
-- reduce the possible array of matches to 0 or 1 elements to aid Array pattern matching
case uncons $ fromMaybe [] $ Regex.match r remainder of
Just { head: Just matched, tail: _ } ->
-- only accept matches at position 0
if startsWith matched remainder then
Right { result: matched, suffix: { str, pos: pos + length matched } }
else
Left { pos, error: ParseError $ "no match - consider prefacing the pattern with '^'" }
_ ->
Left { pos, error: ParseError $ "no match" }However, I think I'd be happier if I also included this as a convenience:
--| build the regular expression from the pattern and match it
regex' :: String -> Parser String
regex' pat =
case er of
Left _ ->
fail $ "Illegal regex " <> pat
Right r ->
regex r
where
pattern =
if startsWith "^" pat then
pat
else
"^" <> pat
er = Regex.regex pattern noFlagsLooks good, could you please open a PR? Thanks!
OK - I'll probably get time to do this in 2 or 3 days. Many thanks for the review. Just one more thing - where do you want me to put it? In String or perhaps in a new module: Regex?
Let's go with the string module for now. Thanks!
closed via e5699a9