`optional` doesn't propagate fail when used with `<|>`
Closed this issue · 38 comments
Describe the bug
So, I'm not clear why, but when you execute under Down here are even simpler steps-to-reproduceoptionMaybe a code that uses two optionals, then even if the code consumes input before failing, optionMaybe thinks it didn't. The interesting feature of this bug is that if you remove at least one optional, it will fail as expected.
Spent a few hours digging, first to a parsing bug and then to the Parsing bug (pun intended), came down with the minimal steps below 😊
To Reproduce
UPD: simpler steps-to-reproduce are in this comment
Run the following code:
module Main where
import Prelude
import Data.Maybe (Maybe(..))
import Effect (Effect)
import Effect.Console (logShow)
import Parsing (Parser, fail, runParser)
import Parsing.Combinators (optionMaybe, optional)
import Parsing.String (char)
import Parsing.String.Basic (whiteSpace)
type TextParser = Parser String
-- Parsing trailing whitespace provides a number of benefits compared to parsing
-- leading whitespace. For details see chapter 3 of "Design Patterns for Parser
-- Combinators (Functional Pearl)" paper.
lexeme :: ∀ a. TextParser a -> TextParser a
lexeme p = p <* optional whiteSpace
parseTypes :: TextParser String
parseTypes = do
_ <- lexeme $ lexeme $ (char 'f' *> char 'o' *> char 'o')
fail "test failure"
parseImpl :: TextParser String
parseImpl = do
optionMaybe parseTypes >>= case _ of
Nothing -> pure ""
Just x -> pure x
main :: Effect Unit
main = logShow $ runParser "foo" parseImplExpected behavior
Code should return (Left "test failure") because the paragraph being executed under optionMaybe has consumed the input (thrice even) before failing.
Actual behavior
It returns (Right "")
That's a serious problem… I just tried to remove as many excess lexemes from my project as possible, but somehow I still get no failure where there should be one. Don't even see two consequential lexeme's in my actual code TBH, but then again I don't know why this bug works the way it does, so maybe it happens in other circumstances as well, like perhaps if you execute two optionals non-consequentially under optionMaybe section… 🤷♂️
Is this an issue with left recursion maybe? It could definitely be a bug also, but that's something to look at / consider if you've not encountered material about it before.
I don't see recursion in the steps-to-reproduce, unless I'm missing something…
optionMaybe silences errors. Any error becomes a Nothing, and you are mapping that to "". In parsing there's currently no difference between fail errors and other parse error. Some parser combinator libraries have a distinction between the two. This seems like it's behaving as expected, given that?
EDIT: nm, I take that back, it doesn't use try.
EDIT: nm, I take that back, it doesn't use
try.
Just to clarify: even if it would, that would be a bug because it's documented to only return Nothing if parser didn't consume the input (which it does in the "steps-to-reproduce")
The interesting feature of this bug is that if you remove at least one
optional, it will fail as expected.
That doesn't seem to be true for me, dropping a lexeme from parseTypes:
parseTypes :: TextParser String
parseTypes = do
_ <- lexeme (char 'f' *> char 'o' *> char 'o')
fail "test failure"Also produces Right "". Or is that not what you meant?
@garyb oh, interesting, let me cross off that line… The situation that quote was referring to was tested when instead of sequence of chars I had a single string "foo". But right before posting I replaced string "foo" to the chars sequence in order to emphasize that input was consumed. I didn't think it may… I guess, kind of simplify the problem? So that's not a bad news I presume.
…although, now that I test, even with string "foo" removing the additional lexeme works (I mean, "works" in a way that the bug is there). Not sure what's happening… Anyway, good point!
While re-checking the code, I just managed to reduce the problem even further! I guess I'll post it here, because at this point re-editing the issue may introduce confusion…?
So, I just found that optional isn't even needed! The problem somehow related to whitespace!
Here's the smaller steps to reproduce, no optional here:
module Main where
import Prelude
import Data.Maybe (Maybe(..))
import Effect (Effect)
import Effect.Console (logShow)
import Parsing (Parser, fail, runParser)
import Parsing.Combinators (optionMaybe)
import Parsing.String (string)
import Parsing.String.Basic (whiteSpace)
type TextParser = Parser String
parseTypes :: TextParser String
parseTypes = do
_ <- string "foo" <* whiteSpace
fail "test failure"
parseImpl :: TextParser String
parseImpl = do
optionMaybe parseTypes >>= case _ of
Nothing -> pure ""
Just x -> pure x
main :: Effect Unit
main = logShow $ runParser "foo" parseImplSo, an even smaller reproducer is replacing whole paragraph matching on optionMaybe with just parseTypes <|> pure "". So:
module Main where
import Prelude
import Effect (Effect)
import Effect.Console (logShow)
import Parsing (Parser, fail, runParser)
import Parsing.Combinators ((<|>))
import Parsing.String (string)
import Parsing.String.Basic (whiteSpace)
type TextParser = Parser String
parseTypes :: TextParser String
parseTypes = do
_ <- string "foo" <* whiteSpace
fail "test failure"
parseImpl :: TextParser String
parseImpl = parseTypes <|> pure ""
main :: Effect Unit
main = logShow $ runParser "foo" parseImplI guess this points out the problem is certainly in whiteSpace and not due to some odd interaction with other functions.
CC: @jamesdbrock as the takeWhile author.
I don't have enough understanding of Parsing internals to see how to implement it in code, but I think I figured out the problem source as well as the solution.
whiteSpace is declared as takeWhile isSpace. takeWhile is declared as follows:
takeWhile :: forall m. (CodePoint -> Boolean) -> ParserT String m String
takeWhile predicate =
consumeWith \s ->
let
value = String.takeWhile predicate s
in
Right
{ consumed: value
, remainder: SCU.drop (SCU.length value) s
, value
}Now, in Parsing the question of whether input was consumed is determined by consumed flag. The takeWhile completely ignores consumed of prior parsers and overrides it with the value of String.takeWhile.
What should be done here instead is that code should test previous consumed, and if it's set then propagate that further.
To test this idea, I took the previous comment code, and added a space as … runParser "foo " …. This made failure propagate.
Nice find! That sounds like the culprit.
Thank you! Any takers to send a PR? This sounds trivial to implement, I just don't know where and how to pattern-match on the older consume.
It's the last argument to ParseState (the underscore).
purescript-parsing/src/Parsing/String.purs
Line 285 in c38e6ea
Then update:
purescript-parsing/src/Parsing/String.purs
Line 290 in c38e6ea
to something like:
-- prevConsumed is the consumed slot from ParseState
runFn2 done (ParseState remainder (updatePosString pos consumed remainder) (prevConsumed || not (String.null consumed))) value@natefaubion just to clarify: are you saying, the bug should be fixed in consumeWith and not in takeWhile, right?
Well… apparently, there's more than one bug. consumeWith fix makes failure propagate in the steps that I reduced as part of latter experimentation, however the bug from the 0-th post is not fixed.
I'll split this to a separate issue for the purpose of PR.
So, on that case I sent a PR.
Regarding the case here, here's the minimal steps that are buggy even with the fix (it boils down to having optional)
module Main where
import Prelude
import Effect (Effect)
import Effect.Console (logShow)
import Parsing (Parser, fail, runParser)
import Parsing.Combinators (optional, (<|>))
import Parsing.String (string)
import Parsing.String.Basic (whiteSpace)
type TextParser = Parser String
parseTypes :: TextParser String
parseTypes = do
_ <- string "foo" <* optional whiteSpace
fail "test failure"
parseImpl :: TextParser String
parseImpl = parseTypes <|> pure ""
main :: Effect Unit
main = logShow $ runParser "foo" parseImplOkay, folks, I think I more or less figured this one too, but something's doesn't add up, probably just because I don't know the code.
In above code both string "foo" and whiteSpace are implemented via consumeWith. By inserting console.log()s into generated js I found that consumeWith as you'd expect is called twice, and both times it has old consumed flag (the one @natefaubion pointed out) set to false, which isn't expected. Note that string "foo" consumes input, and hence the whiteSpace should get called with the flag being true, but it receives false instead.
After some digging I figured the bug seems to be in Alt implementation:
instance Alt (ParserT s m) where
alt (ParserT k1) (ParserT k2) = ParserT
( mkFn5 \state1@(ParseState input pos _) more lift throw done ->
more \_ ->
runFn5 k1 (ParseState input pos false) more lift
( mkFn2 \state2@(ParseState _ _ consumed) err ->
more \_ ->
if consumed then
runFn2 throw state2 err
else
runFn5 k2 state1 more lift throw done
)
done
)Per my understanding, it should be changed to propagate previous consumed as well:
--- a/src/Parsing.purs
+++ b/src/Parsing.purs
@@ -350,9 +350,9 @@ instance MonadError ParseError (ParserT s m) where
-- | section __2.3 Backtracking__.
instance Alt (ParserT s m) where
alt (ParserT k1) (ParserT k2) = ParserT
- ( mkFn5 \state1@(ParseState input pos _) more lift throw done ->
+ ( mkFn5 \state1@(ParseState input pos oldConsumed) more lift throw done ->
more \_ ->
- runFn5 k1 (ParseState input pos false) more lift
+ runFn5 k1 (ParseState input pos oldConsumed) more lift
( mkFn2 \state2@(ParseState _ _ consumed) err ->
more \_ ->
if consumed thenIt fixes the bug discussed, however it also breaks tests, so I'm wondering if perhaps I am misunderstanding something?
An even smaller code-to-reproduce:
module Main where
import Prelude
import Effect (Effect)
import Effect.Console (logShow)
import Parsing (Parser, fail, runParser)
import Parsing.Combinators ((<|>))
import Parsing.String (string)
type TextParser = Parser String
parseTypes :: TextParser Unit
parseTypes = do
_ <- string "foo"
void (fail "fail after consumption") <|> pure unit
main :: Effect Unit
main = logShow $ runParser "foo" parseTypesOkay guys, I have re-studied everything, and I am certain the change I described one comment above is correct.
So now the question is: why after this change running tests results in:
TESTS Indentation
file:///home/constantine/Projects/purescript-parsing/output/Test.Assert/foreign.js:4
if (!success) throw new Error(message);
^
Error: error: (ParseError "Predicate unsatisfied" (Position { column: 2, index: 1, line: 1 }))
Predicate unsatisfied at position index:1 (line:1, column:2)
▼
k
at file:///home/constantine/Projects/purescript-parsing/output/Test.Assert/foreign.js:4:27
at Module.__do (file:///home/constantine/Projects/purescript-parsing/output/Test.IndentationTests/index.js:158:511)
at __do (file:///home/constantine/Projects/purescript-parsing/output/Test.Main/index.js:539:27)
at file:///home/constantine/Projects/purescript-parsing/.spago/run.js:3:1
at ModuleJob.run (node:internal/modules/esm/module_job:271:25)
at async onImport.tracePromise.__proto__ (node:internal/modules/esm/loader:547:26)
at async asyncRunEntryPointWithESMLoader (node:internal/modules/run_main:98:5)
? From this output I don't even understand what test is failing. It refers to foreign JS for some reason, and "Predicate unsatisfied" message isn't even in tests but in core library 🤷♂️
I think the alt implementation is OK. There's a difference here between consumeWith and alt. consumeWith is always sequential, while alt is parallel. That is, consumeWith propagates state forward, while alt forks it. Keep in mind the callback in alt is happening on the throw path, not the done path. The done path is untouched, so state propagates forward as normal. Alt must know whether the left branch consumed anything itself or not, and it can only tell that if it resets the flag to false. Otherwise you could get a branch that does not consume anything but still reports as having been consumed!
Any combinator that makes sequential progress (such as consumeWith) needs to take the previous consumed flag into account. That is, it's not a local decision of "this particular step consumed something". The consumed state means "this particular branch consumed something".
Well, I don't have anything to say here. Your comment is done from high-level knowledge of the library, whereas mine is from low-level investigation of the generated JS and sprinkling console.log()s in it. The two are clearly contradictory.
parseTypes :: TextParser Unit
parseTypes = do
_ <- string "foo"
void (fail "fail after consumption") <|> pure unitThis isn't a correct example, or at least your expectation is incorrect. The bracketing of this is:
parseTypes :: TextParser Unit
parseTypes = do
_ <- string "foo"
(void (fail "fail after consumption") <|> pure unit)consumed only affect the current branch. That is, it's purpose is only to track whether the current executing branch (up to the nearest alt) has consumed anything. A fail "foo" <|> pure unit will always return pure unit because fail doesn't consume anything. This is correct and expected behavior.
If you change the bracketing to:
parseTypes :: TextParser Unit
parseTypes = (do
_ <- string "foo"
void (fail "fail after consumption")) <|> pure unitThen this will fail as expected.
Part of the reason why this is so confusing is the semantics of consumed is entangled with a State effect, making it non-local. It should really be a Writer over the Disj Boolean Monoid.
If you change the bracketing to:
parseTypes :: TextParser Unit
parseTypes = (do
_ <- string "foo"
void (fail "fail after consumption")) <|> pure unit
Then this will fail as expected.
Thank you! But you probably meant (void (fail "fail after consumption")) <|> pure unit.
I tried that first and it did fail… but then I realized I had the changes to instance Alt inside Parsing.purs that were discussed. After reverting them I get no failure, see:
λ cat src/Main.purs
module Main where
import Prelude
import Effect (Effect)
import Effect.Console (logShow)
import Parsing (Parser, fail, runParser)
import Parsing.Combinators ((<|>))
import Parsing.String (string)
type TextParser = Parser String
parseTypes :: TextParser Unit
parseTypes = do
_ <- string "foo"
(void (fail "fail after consumption")) <|> pure unit
main :: Effect Unit
main = logShow $ runParser "foo" parseTypes
λ spago run
[info] Build succeeded.
[warn] None of your project files import modules from some projects that are in the direct dependencies of your project.
These dependencies are unused. To fix this warning, remove the following packages from the list of dependencies in your config:
- maybe
- unicode
(Right unit)No, I don't mean that. Let me reformat:
parseTypes :: TextParser Unit
parseTypes = do
_ <- string "foo"
void (fail "fail after consumption") <|> pure unitThis is working correctly. The <|> is bracketed around the line with fail only.
parseTypes :: TextParser Unit
parseTypes = do
_ <- string "foo"
(void (fail "fail after consumption") <|> pure unit)Algebraically, fail "foo" <|> x can always be replaced by x only, making this is equivalent to:
parseTypes :: TextParser Unit
parseTypes = do
_ <- string "foo"
pure unitIf you change the bracketing to:
parseTypes :: TextParser Unit
parseTypes =
( do
_ <- string "foo"
void (fail "fail after consumption")
) <|> pure unitThen you will get the error you expect. That's because the alt is on the "outside", it sees that it consumed "string" before hitting an error. It then propagates that error. This is working as expected.
Oooh, I see, so, even if I bracket it as:
[…]
(void (fail "fail after consumption")) <|> pure unitit will still be executed as if it was
[…]
(void (fail "fail after consumption") <|> pure unit)?
Welp… this is confusing, but apparently that's how even Parsec works:
module Main where
import Control.Monad (void)
import Text.Parsec
import Text.Parsec.String (Parser)
type TextParser a = Parsec String () a
parseTypes :: TextParser ()
parseTypes = do
_ <- string "foo"
(void (fail "fail after consumption")) <|> return ()
main :: IO ()
main = print $ parse parseTypes "" "foo"Similarly, returns Right () instead of failure. Okay, thank you for clarification, I am closing this then.
Okay, I closed too early. We found the problem in the last reduced steps-to-reproduce, which is basically a tricky thing that one shouldn't use <|> anywhere but the very top of a do-expression.
module Main where
import Prelude
import Effect (Effect)
import Effect.Console (logShow)
import Parsing (Parser, fail, runParser)
import Parsing.Combinators (optional, (<|>))
import Parsing.String (string)
import Parsing.String.Basic (whiteSpace)
type TextParser = Parser String
parseTypes :: TextParser String
parseTypes = do
_ <- string "foo" <* optional whiteSpace
fail "test failure"
parseImpl :: TextParser String
parseImpl = parseTypes <|> pure ""
main :: Effect Unit
main = logShow $ runParser "foo" parseImplHas no such problem. Here whole code that consumes and fails has no branching within do-expression. The nearest branching resides outside it. And yet it succeeds.
To be clear, I still think there is the issue in consumeWith.
To be clear, I still think there is the issue in consumeWith.
consumeWith problem I split to a separate issue, and I am testing with the fix for that problem. I don't think there's anything left to do in consumeWith 😊
In order to get your example to fail (ie, meet your expectation), I needed the fix to consumeWith, and also remove the optional from whiteSpace.
type TextParser = Parser String
parseTypes :: TextParser String
parseTypes = do
_ <- string "foo" <* whiteSpace
fail "test failure"
parseImpl :: TextParser String
parseImpl = parseTypes <|> pure ""
main :: Effect Unit
main = logShow $ runParser "foo" parseImpl(Left (ParseError "test failure" (Position { column: 4, index: 3, line: 1 })))
The optional is unnecessary. If we remove it, it works correctly (WAT). If we narrow it down and inline a little just to remove indirection, we can see this, which behaves unexpectedly:
parseInner :: TextParser String
parseInner = do
_ <- string "foo"
takeWhile isSpace <|> pure ""However, the following two alternatives behave expectedly:
parseInner :: TextParser String
parseInner = do
_ <- string "foo"
takeWhile isSpaceparseInner :: TextParser String
parseInner = do
_ <- string "foo"
takeWhile1 isSpace <|> pure ""So there's some confusing semantics around primitive parsers which can both:
- Fail
- Succeed while not consuming input
That is, as long you consume input or fail, everything works as expected. One fix for this is to define takeWhile in terms of takeWhile1 x <|> pure "". That is, primitive parses should all meet this condition, and a parser that needs to both fail and success while not consuming input would need to be defined in terms of <|>. I think this is "in the spirit" of original Parsec, where you are semantically only parsing a list of Chars. That's the only semantics that really makes sense at that point.
The alternative (yuk yuk) fix would be to indeed fix the Alternative instance, but not in the way you highlighted. It would be to correct it to be in line with the semantics I noted above around writer, and we would need to consider the prevConsumed in the done continuation:
instance Alt (ParserT s m) where
alt (ParserT k1) (ParserT k2) = ParserT
( mkFn5 \state1@(ParseState input pos prevConsumed) more lift throw done ->
more \_ ->
runFn5 k1 (ParseState input pos false) more lift
( mkFn2 \state2@(ParseState _ _ consumed) err ->
more \_ ->
if consumed then
runFn2 throw state2 err
else
runFn5 k2 state1 more lift throw done
)
( mkFn2 \(ParseState input' pos' consumed) res ->
runFn2 done (ParseState input' pos' (prevConsumed || consumed)) res
)
)All of the three alternatives above are interchangeable then. I think if we wanted to go the route of fixing the alternative instance, I think we should just change the semantics internally to Writer. Then it becomes baked into the framework, and the consumed flag becomes a local decision rather than the awkward "whether the current branch has consumed anything", which is a source of endless confusion and subtlety. The downside is this is a breaking change to the internal representation. I don't know if we consider that part of the API though?
We could also "for now" fix consumeWith to require a NonEmptyString for consumed, otherwise it must fail. This would uphold the "consume or fail" invariant.
The
optionalis unnecessary. If we remove it, it works correctly (WAT). If we narrow it down and inline a little just to remove indirection, we can see this, which behaves unexpectedly:parseInner :: TextParser String
parseInner = do
_ <- string "foo"
takeWhile isSpace <|> pure ""
I got lost here. The reason the example you suggest works unexpectedly IIUC is the one discussed before I closed the issue. It boils down to the fact that in Parsing (and Parsec for that matter) the <|> should never be used anywhere but the very top of a do-expression, otherwise it's easy to misuse the code. It's basically like goto in C: can be used for a bunch of things, but shouldn't.
I understand you moved the parseTypes <|> pure "" to that function… But I don't really understand: shouldn't per previous discussion the fact that this comparison resides outside make parseTypes execute first, and then <|> pure "" second…? Which is no longer the case when you inline it.
One fix for this is to define
takeWhilein terms oftakeWhile1 x <|> pure "".
FTR, I tested this fix by running:
myWhiteSpace :: TextParser String
myWhiteSpace = takeWhile1 isSpace <|> pure ""
parseTypes :: TextParser String
parseTypes = do
_ <- string "foo" <* optional myWhiteSpace
fail "test failure"
parseImpl :: TextParser String
parseImpl = parseTypes <|> pure ""
main :: Effect Unit
main = logShow $ runParser "foo" parseImplBut it still returns Right "". Yes, I know I returned optional back in place, but it's because it is part of the bug. In a real parser it's hard to make sure you haven't got somewhere an occasional nesting optional $ optional someFoo, which should work same as optional someFoo anyway.
It boils down to the fact that in Parsing (and Parsec for that matter) the <|> should never be used anywhere but the very top of a do-expression, otherwise it's easy to misuse the code. It's basically like goto in C: can be used for a bunch of things, but shouldn't.
I don't really understand what you are getting at with "should never be used anywhere but the very top of a do-expression". I don't think this should be the takeaway at all. The reason the previous inaccurate example was incorrect was because I think you either had an incorrect assumption of how it's supposed to work, or you had an incorrect assumption about how the language was implicitly bracketing your code (ie, a question of precedence). <|> has well defined semantics, and your expectation just did not align with those semantics, possibly due to bugs in the implementation itself.
FTR, I tested this fix by running:
Then maybe changing the done continuation in the Alt instance is the only complete solution.
Thank you! ❤️
I see, fixes have been accumulating since 2022, it is maybe worth making a new release, so that PureScript users would get the library from Registry with less bugs?
