pointfreeco/swift-parsing

Parse consumes from input even if one parser inside fails

morstin opened this issue · 1 comments

Dear all,

first thank you very much for upgrading the parsing library to a new way of parsing. Though it took me some time to update from the old way to the new, I think it is very clear way of describing a parsing problem.

I have following problem: My understand is, that if a Parsers fails, it should not consume anything from input, so that the untouched input can be presented to another parser using OneOf.

My case:

let getStartDate = Parse( { "@start(" + (String($0) ?? "") } ) {
    "@start(".utf8
    PrefixThrough(")".utf8)
}

The problem:

  • if the 2nd parser of "getStartDate" fails
  • the first one still consumes "@start("
  • My assumption was that "getStartDate" should not consume anything if it overall fails

Any comments or hints?

Best regards,
Morstin

PS: Below you find the playground code that I ran in the library.

import Parsing

/// Sample input to be parsed.
var inputCorrect =
"""
@start(2021-09-30)this not and here also not
of clourse not parsed
"""[...].utf8

var inputFalse =
"""
@start(2021-09-30xthis not and here also not
of clourse not parsed
"""[...].utf8

let getStartDate = Parse( { "@start(" + (String($0) ?? "") } ) {
    "@start(".utf8
    PrefixThrough(")".utf8)
}

let resultCorrect = try getStartDate.parse(&inputCorrect)
String(resultCorrect)
String(inputCorrect)

let resultFalse = try? getStartDate.parse(&inputFalse)
String(resultFalse ?? "Parser failed")
String(inputFalse)
/* remaining input
""2021-09-30xthis not and here also not\nof clourse not parsed""
 Is this not wrong? as this means the parser failed, but did consume the first part "@start("
 */

Hi @morstin! This behavior is actually documented here. The parsers in the library minimally backtrack: only at points where you compose them together using OneOf, Optionally, and a few others.

There are two main ways to invoke a parser:

// 1. The `inout` version that consumes input.
try parser.parse(&input)

// 2. The non-`inout` version that doesn't care about what input was consumed.
try parser.parse(input)

The vast majority of the time you should call 2., the non-inout version, where you don't need to worry about what was consumed. As long as you build your parsers out of other parsers, like OneOf, backtracking is handled automatically for you.

The rare exception where you may want to call 1. is when you are writing your own, manual Parser conformance, and its parse requirement (which is handed an inout Input) should be consuming from input. Creating new parser conformances is rare because you can usually get by with the parsers that ship in the library.

This minimal backtracking is a "feature": it allows most parser conformances to be simpler because they do not have to worry about cleaning up after themselves, and it minimizes extra work done in those parsers.

Since this isn't a bug with the library, I'm going to convert this to a discussion, but let us know if you have any other questions!