39aldo39/klfc

Support U+1234 notation for Unicode symbols?

kindaro opened this issue · 5 comments

It is customary to denote Unicode characters by U+[character code], but it seems klfc does not support this notation:

% klfc --from-json x --xkb y
klfc: parse fail in x: Error in $.keys[0].letters[0]: ‘U+2002’ is not a valid letter

Note that the character in this example is a kind of a space, so I would like not to insert it verbatim: it will hardly be clear for the reader what kind of space it is.

I also propose that there were a flag that allows writing json files with U+... format. Most fonts only support a narrow range of characters, so in many cases the more unusual characters would not show in any meaningful way.

KLFC uses a normal JSON file, so you can use the syntax "\u2002" already.

I find it to be somewhat «wrong» to let the format of serialization define the ways in which I may or may not define a symbol, for the following reasons:

  • Formats come and go. Today it is JSON, tomorrow YAML, then Dhall.
  • Escape sequences are transient. For instance, if I convert a JSON file with both "—" and "\u2014" to YAML and back, I will either get two dashes or two escape sequences. The distinction will be lost.
  • When I wish to write a Unicode number of a symbol instead of the symbol itself, it is because I have an intention. A program should recognize and intrinsically acknowledge that intention, rather than making it «accidentally supported».

I am sure this change is technically feasible. If someone were to make it, would you merge?

I don't think it necessarily wrong to let the format decide it, but I understand that the intention may be lost as most parsers throw away that information. However, it is also not very elegant to basically make your own escape sequences. For example, ligatures can currently be written as lig:U+2002, which outputs the literal string "U+2002". If you would also allow other notations, this becomes ambiguous. I don't know a nice solution for that.

I have now added explicit support for Unicode characters!

I do observe that this feature works.

If in the future we want to make sure ligatures can include strings that resemble the notation for Unicode code points, we can allow for the specification of a key to include not only strings, but objects like {"type": "ligature", "contents": "U+2002"} which would output the literal string "U+2002", and even {"type": "ligature", "contents": ["U+2245", " is for isomorphose"]]} which would output the literal string "≅ is for isomorphose".