purescript-contrib/purescript-string-parsers

Could we have a regex combinator in Text.Parsing.StringParser.String?

Closed this issue · 12 comments

regex :: String -> Parser String

such that parsing "aaaaab" using (regex "a+") gives Right "aaaaa"

I have a prototype implementation - It's a little inelegant but seems to do the trick. I wonder if you'd accept a PR along these lines, and if so, how you'd prefer it to be packaged?

paf31 commented

Maybe paste the code here then, and we can review before you go to the trouble of creating a PR?

module ParserExtra (regex) where

import Data.Either (Either(..))
import Data.Maybe (Maybe(..), fromMaybe)
import Data.String (drop, length)
import Data.String.Regex as Regex
import Data.String.Regex.Flags (noFlags)
import Data.String.Utils (startsWith)
import Prelude ((<>), (+), ($), show)
import Text.Parsing.StringParser (Parser(..), ParseError(..), fail)
import Data.Array (take)

regex :: String -> Parser String
regex pat =
  let
    pattern =
        if startsWith "^" pat then
            pat
        else
            "^" <> pat
    er = Regex.regex pattern noFlags
  in
    case er of
      Left _ ->
        fail $ "Illegal regex " <> show pat
      Right r ->
        Parser \{ str, pos } ->
          let
            remainder = drop pos str
          in
            -- reduce the possible array of matches to 0 or 1 elements to aid Array pattern matching
            case take 1 $ fromMaybe [] $ Regex.match r remainder of
              [ Just matched ] ->
                  Right { result: matched, suffix: { str, pos: pos + length matched } }
              _ ->
                let
                  msg = "Regex pattern " <> show pat <> " did not match"
                in
                   Left { pos, error: ParseError msg }


paf31 commented

A few notes:

  • Maybe use a where clause instead of let?
  • Take a Regex as an argument instead of a String, then you don't need to handle the error case, and it can be precompiled.
  • Instead of take 1, you could use uncons.

Otherwise, looks great!

That was quick! Many thanks for the advice. I'm happy about the first and last suggestion, and I can see the point of recompiling, but I'm just a little uneasy about it being used inappropriately if we do this. A user could provide a legitimate pattern but one that was not constrained to match the very first character in the target text. Might this lead to confusion?

paf31 commented

Well, we could look at the match, and make sure it matched at position zero, or fail, perhaps.

Ah - that's a good idea - i didn't think of that. I'll experiment a little and post another attempt here later on when I've played with it. Thanks for taking the time to look at it.

OK - I have the next iteration. I don't think it's possible to change the remaining let to where but perhaps I'm wrong:

module ParserExtra1 (regex) where

import Data.Either (Either(..))
import Data.Maybe (Maybe(..), fromMaybe)
import Data.String (drop, length)
import Data.String.Regex as Regex
import Data.String.Utils (startsWith)
import Prelude ((+), ($))
import Text.Parsing.StringParser (Parser(..), ParseError(..))
import Data.Array (uncons)

-- | Match the regular expression
regex :: Regex.Regex -> Parser String
regex r =
  Parser \{ str, pos } ->
    let
      remainder = drop pos str
    in
      -- reduce the possible array of matches to 0 or 1 elements to aid Array pattern matching
      case uncons $ fromMaybe [] $ Regex.match r remainder of
        Just { head: Just matched, tail: _ }  ->
          -- only accept matches at position 0
          if startsWith matched remainder then
            Right { result: matched, suffix: { str, pos: pos + length matched } }
          else
            Left { pos, error: ParseError $ "no match - consider prefacing the pattern with '^'" }
        _ ->
            Left { pos, error: ParseError $ "no match" }

However, I think I'd be happier if I also included this as a convenience:

--| build the regular expression from the pattern and match it
regex' :: String -> Parser String
regex' pat =
    case er of
      Left _ ->
        fail $ "Illegal regex " <> pat
      Right r ->
        regex r
    where
      pattern =
        if startsWith "^" pat then
          pat
        else
          "^" <> pat
      er = Regex.regex pattern noFlags
paf31 commented

Looks good, could you please open a PR? Thanks!

OK - I'll probably get time to do this in 2 or 3 days. Many thanks for the review. Just one more thing - where do you want me to put it? In String or perhaps in a new module: Regex?

paf31 commented

Let's go with the string module for now. Thanks!

closed via e5699a9