alecthomas/participle

Struct capture wrongly applying previous captures in a failed branch

petee-d opened this issue · 2 comments

Hey, while working on parser code generation and implementing lookahead error recovery, I noticed a bug. Consider this example:

type BugStructCapturesInBadBranch struct {
	Bad bool                `parser:"(@'!'"`
	A   BugFirstAlternative `parser:" @@) |"`
	B   int                 `parser:"('!' '#' @Int)"`
}

type BugFirstAlternative struct {
	Value string `parser:"'#' @Ident"`
}

func TestBug_GroupCapturesInBadBranch(t *testing.T) {
	var out BugStructCapturesInBadBranch
	require.NoError(t, MustBuild(&BugStructCapturesInBadBranch{}, UseLookahead(2)).ParseString("", "!#4", &out))
	assert.Equal(t, BugStructCapturesInBadBranch{B: 4}, out)
}

I tried to make it as minimalistic as reasonable, it's quite an obscure bug that's unlikely to bother someone but I thought I'd report it anyway. strct.Parse will call ctx.Apply even if s.expr.Parse returned an error. The purpose of that is apparently providing a partial AST in case the entire parsing fails, but is has an unwanted side-effect. Any captures added to parseContext.apply added by the branch so far will be applied, even though the error may later be caught by a disjunction or a ?/*/+ group and recovered. I think this can only happen if lookahead is at least 2, as it requires one token for that unwanted capture and a second token for the strct to return an error instead of nil out.

In the example above, the input is constructed to match the second disjunction alternative, but the first tokens will initially lead it into the first alternative and into the BugFirstAlternative struct. When attempting to match Ident for the Value field, the sequence will fail and return an error, but ctx.apply will already contain a capture for BugStructCapturesInBadBranch.Bad, which will be applied in strct.Parse, even though the disjunction recovers it and matches the second alternative.

I don't think it's super important this is fixed, but my code generation parser's behavior will differ from this, because I'm trying to take a different approach to recovering failed branches - restoring to a backup of the parsed struct when branch fails instead of delaying applying captures.

@alecthomas ping about this. The generated parser will differ from the reflective parser's behavior here and I would like to make sure that the reported behavior is indeed incorrect - definitely seems so.

Definitely a bug. I'm pretty surprised this hasn't come up before TBH.