alda-lang/alda

client/parser/scanner.go: invalid rune quoting

3052 opened this issue ยท 9 comments

3052 commented

runes should be quoted with %q, not '%c'

func (s *scanner) unexpectedCharError(
c rune, context string, line int, column int) *model.AldaSourceError {
if context != "" {
context = " " + context
}
var charStr string
switch {
case c == 9:
charStr = "tab"
case c == 10:
charStr = "newline"
case c == 13:
charStr = "carriage return"
case unicode.IsControl(c):
charStr = fmt.Sprintf("control character (%d)", c)
default:
charStr = fmt.Sprintf("'%c'", c)

%q	a single-quoted character literal safely escaped with Go syntax.

https://godocs.io/fmt#hdr-Printing

Hi, thanks for the discussion!

I don't fully understand the issue. I played around with this a bit in Go Playground, and as far as I can tell, the code we have now where we're printing '%c' (with added single quotes) produces identical results to what we would get if we switched to %q (without quotes). For example:

package main

import "fmt"

func main() {
	fmt.Printf("%%c is %c, %%q is %q\n", '๐Ÿง', '๐Ÿง')
}

Produces the following output:

%c is ๐Ÿง, %q is '๐Ÿง'

Is there an edge case that I'm missing?

3052 commented

yeah the single quote itself, '

I'm inclined to leave this the way it is. The explicit single quotes in the Alda error message are unrelated to Go's choice to put single quotes around a rune when you print it via %q. Hypothetically, if we wanted to change the single quotes in the error message in the future, the way we have it now would make it easier to understand how to do that.

3052 commented

suit yourself, but in some cases you are going to end up with ''', which is obviously invalid

That's a good example to consider!

I tried that in Go Playground just now, and I got this output:

%c is ', %q is '\''

Given the choice between presenting end users with ''' or '\'', I think I would actually prefer '''. Yes, it's weird-looking, but I think the backslash could be potentially more confusing.

Another option worth considering here is to remove the single-quotes entirely and just display %c. For example, in the case of a single quote, an error message might look like:

Unexpected ' at the top level

whereas currently, it would be:

Unexpected ''' at the top level
3052 commented

%c is bad, because any control, white space, or otherwise non printing characters are not going to produce helpful output.

Agreed, although we do have explicit checks for those kinds of characters before we fall back to the %c representation.

I appreciate the suggestion to use %q instead. I don't see an immediate need to switch, but I will keep it in mind! Thanks again for the discussion.

3052 commented

as mentioned, the current checks are not robust, so other characters are going to fall through:

package main

import "fmt"

func main() {
   fmt.Printf("'%c'\n", 0x200E) 
   fmt.Printf("%q\n", 0x200E) 
}

result:

'โ€Ž'
'\u200e'

I would again argue that %q is more helpful because of this reason, and because as mentioned, '%c' produces invalid Go code if the input is '.