Feature Request: More lenient parsing

Question

Feature Request: More lenient parsing

Alfus opened this issue a year ago · 6 comments

To provide good completion suggestions, an ast is needed to know if the cursor is in an option or not. However common completion cases do not parse in protocompile, for example:

An empty option block: int32 foo [<cursor>]; (done)
An option without an "= ": int32 foo [bar<cursor>]; (done)
An option with a trailing ",": int32 foo [deprecated = true, <cursor>]; (done)
An option without a trailing semicolon: int32 foo [bar<cursor>] (done)
An option name with a trailing ".": int32 foo [(bar.<cursor>)]; or int32 foo [foo.<cursor>] (done)
A type reference with a trailing ".": foo.<cursor>

Answer 1 · 2023-11-13T20:54:38.000Z

Import cases:

empty import statement: import "<cursor>"; (done)
import statement without a trailing ';': import "<cursor>" (done)

Answer 2 · 2023-12-02T01:56:04.000Z

Hey @Alfus 👋
I have an implementation of this in my fork of protocompile, but I went about it differently, and I'm curious what your thoughts are.

I decided on changing the parser grammar to require semicolons to terminate most declarations, then inserting them from the lexer wherever they are technically required by the grammar. I found that having the grammar be as unambiguous as possible made it much less likely I'd run into shift/reduce problems, especially when combining this with other unrelated grammar changes like permitting trailing commas, extension names with mismatched parentheses ([foo = bar, (<cursor>]) etc.

IIRC there were also a couple places where I was unable to make the grammar unambiguous without doing something like treating '\n' as a token, which gets super weird.

There are some drawbacks to this method, mostly that handling syntax errors when they aren't actually syntax errors is trickier. But you might find this strategy easier overall.

Technically the "best" solution is rolling your own parser, but that's obviously a lot of work.

What are your thoughts?

Answer 3 · 2023-12-08T02:13:15.000Z

I considered several techniques like this (for example injecting a special cursor token), though anything that modifies the input will also make source positions inaccurate.

Answer 4 · 2023-12-08T17:59:41.000Z

Only remaining issue from the original list is making a field with only a type work:

message Foo {
  my.type.<cursor>
}

Answer 5 · 2023-12-08T20:28:40.000Z

Here is how I implemented that one:

messageFieldDecl : fieldCardinality notGroupElementTypeIdent identifier '=' _INT_LIT ';' {
		$$ = ast.NewFieldNode($1.ToKeyword(), $2, $3, $4, $5, nil, $6)
	}
	| fieldCardinality notGroupElementTypeIdent identifier '=' _INT_LIT compactOptions ';' {
		$$ = ast.NewFieldNode($1.ToKeyword(), $2, $3, $4, $5, $6, $7)
	}
	| msgElementTypeIdent identifier '=' _INT_LIT ';' {
		$$ = ast.NewFieldNode(nil, $1, $2, $3, $4, nil, $5)
	}
	| msgElementTypeIdent identifier '=' _INT_LIT compactOptions ';' {
		$$ = ast.NewFieldNode(nil, $1, $2, $3, $4, $5, $6)
	}
// new code below
	| fieldCardinality notGroupElementTypeIdent identifier '=' ';' {
		$$ = ast.NewIncompleteFieldNode($1.ToKeyword(), $2, $3, $4, nil, nil, $5)
	}
	| fieldCardinality notGroupElementTypeIdent identifier ';' {
		$$ = ast.NewIncompleteFieldNode($1.ToKeyword(), $2, $3, nil, nil, nil, $4)
	}
	| fieldCardinality notGroupElementTypeIdent ';' {
		$$ = ast.NewIncompleteFieldNode($1.ToKeyword(), $2, nil, nil, nil, nil, $3)
	}
	| msgElementTypeIdent identifier '=' ';' {
		$$ = ast.NewIncompleteFieldNode(nil, $1, $2, $3, nil, nil, $4)
	}
	| msgElementTypeIdent identifier ';' {
		$$ = ast.NewIncompleteFieldNode(nil, $1, $2, nil, nil, nil, $3)
	}
	| msgElementTypeIdent ';' {
		$$ = ast.NewIncompleteFieldNode(nil, $1, nil, nil, nil, nil, $2)
	}

(NewIncompleteFieldNode here returns the same *ast.FieldNode, but handles missing nodes differently)

Deciding what to do with invalid fields is tricky, I decided to skip them in the parser so they don't end up in descriptors. This means I can't ctrl-click or hover on them etc, but I can still handle them in the formatter, generate semantic tokens for them, use them in completion logic, etc.

Regarding source positions, I was initially concerned about that too, but so far I have not encountered any issues.

Answer 6 · 2024-02-29T15:57:16.000Z

New use case:

Missing field id: int32 foo;