hashicorp/hcl

hclsyntax: Recovery of `FunctionCallExpr` with trailing open `[`

radeksimko opened this issue · 1 comments

Context

Two relatively common cases which come up when editing HCL in an editor are the following:

  • attr = func_call(var.foo.)
  • attr = func_call(var.foo[)

Where the expectation on the IDE is to provide completion of [object] attributes or [map] keys. This is currently only possible to do for the dotted attribute syntax.

Why

There is a difference in how the hclsyntax parser treats trailing [ and trailing ..

Trailing dot (recovery works as expected)

package main

import (
	"log"

	"github.com/hashicorp/hcl/v2"
	"github.com/hashicorp/hcl/v2/hclsyntax"
)

func main() {
	cfg := `attr = bar(var.foo.)`
	f, diags := hclsyntax.ParseConfig([]byte(cfg), "", hcl.InitialPos)
	if len(diags) > 0 {
		log.Printf("diagnostics: %s", diags)
	}
	attrs, _ := f.Body.JustAttributes()
	log.Printf("expression range: %#v", attrs["attr"].Expr.Range())
}
2009/11/10 23:00:00 diagnostics: :1,20-21: Invalid attribute name; An attribute name is required after a dot.
2009/11/10 23:00:00 expression range: hcl.Range{Filename:"", Start:hcl.Pos{Line:1, Column:8, Byte:7}, End:hcl.Pos{Line:1, Column:21, Byte:20}}

The parser produces relatively helpful diagnostic, recognising the assumption that there's an incomplete traversal.

Trailing bracket

// ...
	cfg := `attr = bar(var.foo[)`
// ...
2009/11/10 23:00:00 diagnostics: :1,20-21: Invalid expression; Expected the start of an expression, but found an invalid expression token.
2009/11/10 23:00:00 expression range: hcl.Range{Filename:"", Start:hcl.Pos{Line:1, Column:8, Byte:7}, End:hcl.Pos{Line:0, Column:0, Byte:0}}

Here the parser produces less helpful diagnostic, referring to the whole expression being wrong. Relatedly it also leaves us with an empty FunctionCallExpr CloseParenRange.End, making it impossible or very hard to do further recovery downstream.


I also noticed the ScopeTraversalExpr in that snippet ends up eating the closing paranthesis, which may be the root cause of this:

// ...
	cfg := `attr = bar(var.foo[)`
// ...
	attrs, _ := f.Body.JustAttributes()
	funcExpr := attrs["attr"].Expr.(*hclsyntax.FunctionCallExpr)
	log.Printf("expression range: %#v", funcExpr.Args[0].Range())
2009/11/10 23:00:00 expression range: hcl.Range{Filename:"", Start:hcl.Pos{Line:1, Column:12, Byte:11}, End:hcl.Pos{Line:1, Column:21, Byte:20}}

And similar problem exists for other variations with [, e.g.

output "test" {
  value = var.foo[
}

where parser ends up eating the closing } and produces equally unhelpful diagnostic:

Invalid expression: Expected the start of an expression, but found an invalid expression token.

Thanks for digging into this, @radeksimko.

The attribute case is relatively easy to handle because an identifier is the only possible valid token to appear after an attribute access dot (aside from some special oddities like the legacy .0 index syntax and legacy splat operator), so the implementation of that operator can just handle the situation entirely itself.

The index case is trickier because any kind of expression is valid to appear in that position, and so the code handling the index operator must just delegate to the full expression parser to interpret what appears after the open bracket.

Therefore I think we would probably need a more "surgical" heuristic here to avoid the index expression parser accidentally constraining which expressions can be accepted in that position.

One possible heuristic that would deal with the two cases you identified in your write-up would be for the parser to peek ahead to see if the next token is some kind of closing bracket -- closing paren, closing brace, etc. It seems unlikely to me that a closing bracket symbol will ever be valid as the start of an expression and so that should hopefully be narrow enough to not constrain future additions to the language.

That heuristic would not handle more complicated situations where the detected error is not immediately after the opening bracket, since anything after that point must be caught by the general expression parser rather than the index expression parser in particular. But hopefully that's still good enough to handle the common cases you've identified.

That does still leave a question about how exactly the parser ought to try to recover when it encounters that situation. I assume that we do still need to see an index expression in the resulting parse tree, which means that the index expression parser needs to return to its caller in a way that causes the partial result to get recorded but also leaves the underlying token stream in a reasonable state for ongoing parsing. In the ideal case it would behave as if the open bracket had been followed by a valid expression of some kind and then a closing ], so that parsing can continue as if the expression had been valid, but we'll have to see if that's actually feasible or if it causes confusion downstream that might raise additional confusing errors that are not really errors.

I think this will take some experimentation to find a suitable compromise.