hashicorp/hcl

json: Reflect escaping characters in `Traversal`.`SourceRange()`

radeksimko opened this issue · 2 comments

Context

As indicated by the inline comment and discovered as part of adding support for reference related code intelligence via language server in hashicorp/hcl-lang#185 a traversal inside a JSON config which includes string key, e.g.

{"attr": "${test[\"key\"]}"}

has a reported range which is off by 2 for each/any string key. i.e. with every map/object key this becomes more inaccurate.

hcl.Range{
	Filename: "test.hcl.json",
	Start:    hcl.Pos{Line:1, Column:13, Byte:12},
	End:      hcl.Pos{Line:1, Column:24, Byte:23},
}

The expected range would be

hcl.Range{
	Filename: "test.hcl.json",
	Start:    hcl.Pos{Line:1, Column:13, Byte:12},
	End:      hcl.Pos{Line:1, Column:26, Byte:25},
}

See https://go.dev/play/p/pnDgnpEeLXA

Proposal

Report accurate range for traversals with string keys inside JSON.

Hi @radeksimko,

This is the way it is right now honestly because I didn't know how to make it behave better: the source ranges inside the JSON string are being generated by hclsyntax while parsing the JSON string's already unescaped content as a template, so it can't see the escape characters in order to take them into account.

I think the trickiest part here is that the source location information originates in the tokenisation step in hclsyntax, so the bytes that are being consumed by that are just raw bytes with no source information attached. Seems like solving this would require even the original source characters to have source location information, so that \" could be counted as a single character " by the scanner while remembering that it actually occupies two characters in the source code.

I expected something like that would add significant overhead to all parsing tasks, and so elected to just let this edge case be a little wrong so that we would not hurt the other more common situations (like not using JSON at all) where tokenizing directly to slices of a byte array is sufficient and fast.

If you have ideas on how we could solve this without either lots of logic duplication or adding overhead to the non-JSON tokenizing process then I'd love to discuss them! But I personally exhausted all of my ideas and so just accepted this compromise out of pragmatism.

If you have ideas on how we could solve this without either lots of logic duplication or adding overhead to the non-JSON tokenizing process

I am guessing that looping over the parsed traversal after the linked LOC (inside *json.expression.Variables()) and mutating the range of any TraverseIndex before appending the traversal would be considered a significant overhead?

My working theory is that we can safely assume that any successfully parsed hcl.TraverseIndex (in JSON) with Key of type cty.String (i.e. no interpolation) implies that the string key was escaped, and that there's only one way to escape " in JSON (\) and therefore we could always add 2 characters to the End.Column and End.Byte (since multi-line strings in JSON aren't a thing AFAIK). This makes me believe that we don't need to care too much about whether we're dealing with escaped or unescaped content, as long as we know we're dealing with JSON.

It still may not make the implementation easy, but I thought it's worth making that distinction.

I suppose a more elegant solution would involve some decoupling of hclsyntax.ParseTemplate() into an internal method which behaves differently depending on whether it's dealing with JSON or HCL, and json.parseTemplate() would pass an extra argument or struct field or something?