xkbcommon/libxkbcommon

Question about XkbToControl results

hyoputer opened this issue · 5 comments

Hi, I have a question about the control modifier in XkbToControl.

When xkb_state_key_get_utf32 is called and the control key is pressed, the return value would be modified by the function above. However, I couldn't find any documentation about the rules. Actually, I did come across information about the control modifiers, but I couldn't understand why the result is 0 when ctrl+2 is pressed. Could you please explain it to me?

So we are talking about these lines:

libxkbcommon/src/state.c

Lines 897 to 914 in c1b6c79

/* Verbatim from libX11:src/xkb/XKBBind.c */
static char
XkbToControl(char ch)
{
char c = ch;
if ((c >= '@' && c < '\177') || c == ' ')
c &= 0x1F;
else if (c == '2')
c = '\000';
else if (c >= '3' && c <= '7')
c -= ('3' - '\033');
else if (c == '8')
c = '\177';
else if (c == '/')
c = '_' & 0x1F;
return c;
}

This code does implement the control transformation described in the XKB document you linked, which is a standard way to input control characters. See also the caret notation.

My understanding is that this implementation also goes much beyond:

  • The range of ASCII characters @ … _ (caret notation) produces the same characters as the range ` … ~, i.e. the caret notation extended to lower-case letters.
  • 2, 6 and / correspond respectively to @, ^ and ? when the former keys are pressed on a US keyboard with Shift. As for lower-case letters, this is just a handy way to avoid having to press Shift.
  • 3, 4, 5, 7 and 8 look like a kind of universal way to input non-letter characters needed in the caret notation, as non-US keyboards may lack theses characters or have them in difficult positions.

However, I cannot find a source for this. The original code has no doc either.

Thank you for the explanation! This helped me a lot!
After reviewing the Wikipedia page you provided, I understand that Control + 2 is equivalent to Control + @. Does this mean the function "XkbToControl" should produce the same results in those cases?

Yes, this is my understanding. Documentation is scarce! It took me a while.

I do not know if this feature is much use these days, appart from some subset such as: ctrl+c, ctrl+d, etc.

I think it would be good to document it. @whot @bluetech explanations welcome!

Some findings that seem to validate my understanding:

  • https://www.vt100.net/shuford/terminal/dec_keyboards_news.txt

    On DEC VT-series terminals CTRL-3 gives escape. No, there is NO ANSI
    STANDARD for these key sequences! They are all proprietary. (The
    VT200-series, though, does form an informal standard of sorts.)

    CTRL-2 gives NUL, CTRL-4 gives FS, CTRL-5 gives GS, CTRL-6 gives RS,
    CTRL-7 gives RS, and CTRL-8 gives US. These are all on DEC terminals.

    While these sequences are not true for 7 and 8, they seem to give some origin.

  • https://misc.openbsd.narkive.com/NvSWf6ax/which-key-shortcuts-are-safe-to-bind-and-some-q-s-about-history-and-os-diffs-re-ctrl-4-means

    In particular:

    And I'll venture a guess why DEC added those combinations: In order
    to type ^[ ^\ ^] to produce the ESC, FS, GS characters, you need
    keys for [ \ ]. If you look at non-English keyboard layouts, you'll
    see that the corresponding keys have been re-purposed for other
    characters. In the old days of national ASCII variants, even the
    characters [ \ ] didn't exist in many national encodings. Later,
    when extended 8-bit character sets were introduced, [ \ ] were only
    made available in a secondary mapping reachable with an extra
    modifier key (AltGr or such). And that's the situation right into
    the present.

    By contrast, combinations like ^3, ^4, ^5 were readily available on keyboards.

  • VT220: Table 3-5 Keys Used to Generate 7-Bit Control Characters

I consider this question answered. Let's continue the discussion in the PR for the details.