Wrong chevrons when using X Keyboard Extension format.
kindaro opened this issue · 2 comments
There are 2 pairs of chevrons in Unicode: U+27E8 U+27E9 and U+2329 U+232A. The latter are deprecated and have wrong width. (Scroll down to the end of the section.)
When I put the good chevrons in my configuration, wrong chevrons are actually bound by the generated X Keyboard Extension files.
My source looks like this:
...
{ "pos": "8", "letters": [ "-", "8", "−", "\u27E8" ] },
{ "pos": "9", "letters": [ "/", "9", "÷", "⟩" ] },
...
The generated symbols file looks like this:
...
key <AE08> { [ minus, 8, U2212, leftanglebracket ] };
key <AE09> { [ slash, 9, division, rightanglebracket ] };
...
Why leftanglebracket
and rightanglebracket
denotations are expanded to the obsolete pair of chevrons is a question in itself, but I have no idea where to post that issue. A solution that works for now is to denote the desired symbols by their Unicode numbers, like this:
...
key <AE08> { [ minus, 8, U2212, U27E8 ] };
key <AE09> { [ slash, 9, division, U27E9 ] };
...
Every time I regenerate the files, I must patch them again, so this is not a long term solution.
How should we approach this problem?
- I propose that we find the upstream of the symbolic
...anglebracket
denotations and ask them to put forward an update. By chance you have a suggestion who that might be? Thexkbcommon
people? - In the meanwhile, a temporary fix could be put in place. We may emit the Unicode numbers for the chevrons instead of the symbolic denotation.
- Possibly we could give the user the power to decide whether to prefer symbolic or numeric denotations? Although I am not sure how that may look ideally, but as a first approximation, a switch to emit numeric denotations exclusively may be good. Actually, is there any reason to emit symbolic denotations, beside human readability?
I feel that when \u#### is used in the source, the XKB files should always use U#### notation. That way, the user can preserve a code point exactly as it was intended.
The issue that 〈 〉 are expanded to wrong/outdated(?) brackets may not be so simple but it'd indeed be a matter for the XKB people. I had to read up on the matter:
• Different fields use different brackets.
• The U+2329/232A code points actually decompose to U+3008/3009!
• These are the Asian CJK punctuation brackets, so they're commonly used.
• Physics happily uses U+3008/3009 in bra-ket notation it seems?
• Thus I've used U+2329/232A in my math dead key table. I'll change that.
• Maths and physics should rightly use U+27E8/27E9 as you said?!
• On the other hand, there are other mathematical signs that are often substituted.
• In sum, I'm unsure as to whether there is a simple right answer here.
But by preserving Unicode values, users can get the one they're after.
As far as I can see, there are far more than two sets of brackets in Unicode, all in all:
https://en.wiktionary.org/wiki/%E2%9F%A8_%E2%9F%A9
https://en.wiktionary.org/wiki/%E3%80%88_%E3%80%89
It turns out that I was wrong in outputting leftanglebracket
. In the documentation of keysymdef.h
it says that for some characters (like this one) the corresponding Unicode point is vague and shouldn't be used, but I did. I updated the parsing, so it should be fixed in the next version.
I do not think it is needed to always explicitly output U+XXXX
, as it is harder to read. In this case it was also not needed to differentiate, as the output was wrong.