[proposed labels: question, feature request] best practices for stateful matching of simple patterns
genovese opened this issue · 4 comments
Megaparsec is terrific: powerful, flexible, a joy to use. I've been making heavy use in several projects. Thanks!
There are two needs that keep coming up, however, and I'm wondering if I am possibly missing some best practices that can obviate them. First, I keep wanting to use something like takeWhile1P
but with various conditions based on the tokens matched. The fast, stateful scanner along these lines requested in issue #314 would fit the bill. Second, I would like a (backtracking) primitive that matches a specified regular expression, even a simple POSIX style without any PCRE fanciness.
I recognize that one can use combinators to mimic the typical regex operators, but when matching higher-level syntactic constructs with variations on the form of their components, this tends to introduce more extra complexity than I would like. For instance, if I'm matching symbols that can start with one set of characters and continue with additional characters in a larger set, I end up with something like this (removing context and other structure):
isSymbolLeadingChar :: Char -> Bool
isSymbolLeadingChar c = isAlphaNum c || T.elem c symbolLeadingChars
isSymbolLaterChar :: Char -> Bool
isSymbolLaterChar c = isAlphaNum c || T.elem c symbolLaterChars
mySymbol :: Parser Text
mySymbol = liftM2 (<>) symbol1 symbol2
where symbol1 = takeWhile1P (Just "Symbol") isSymbolLeadingChar
symbol2 = takeWhileP (Just "Symbol continued") isSymbolLaterChar
This works fine, but it seems a lot of boilerplate for a simple idea. And even with just a few categories like this, things ends up more diffuse and messy. (A few provisos. In some cases, I can grab a more general construct, classify and wrap it accordingly or fail. But when matching particular constructs in context -- such as having a list of the particular kind of symbol above -- it's easier to have a specific parser. I also realize that I can use something like Alex with a custom token type to handle lexing, but there are times when I'd rather keep it all in the family, so to speak.) A stateful scanner primitive would help a bit here, but a simple regex matcher would be even more convenient in this case. (I'd love to see both those additions.)
My question is if there is a better approach within the intended megaparsec idioms to capture simple patterns like this.
I hope this is all clear. Thanks for your help
I agree that scanP
would be helpful here, so I'd count this issue as a supporting case for #314. AFAIA you are not missing anything, except in your example I think you intend to write:
mySymbol :: Parser Text
mySymbol = liftM2 (<>) symbol1 symbol2
where symbol1 = Text.singleton <$> (satisfy isSymbolLeadingChar <?> "Symbol")
symbol2 = takeWhileP (Just "Symbol continued") isSymbolLaterChar
Since I imagine the predicate isSymbolLeadingChar
applies only to the first char, not to N first characters.
Thanks.
On the example, since the second set is a superset of the first, I take as many as possible from the first set while I'm doing so, which is why I did it that way.
Thoughts on the regex matcher?
Sorry, I am not aware of anything that brings regexp support to Megaparsec. Perhaps you could look into lexing with alex
or similar.
Understood. That part was a feature request. Thanks though, all good