XML Unicode provides some convenience methods for inserting Unicode
characters. When it started, the focus was on characters that were
traditionally inserted with named character entities, things like
é
.
In practice, and in the age of UTF-8, the “insert unicode character” function, especially the Helm-enabled version, is much more broadly useful.
You’re most likely going to want to bind some or all of them to keys.
Insert a Unicode character by character name. If a prefix is given,
the character will be inserted regardless of whether or not it has a
displayable glyph; otherwise, a numeric character reference is
inserted if the codepoint is not displayable according to
xmlunicode-character-displayable
.
This function is somewhat obsoleted by new methods in Emacs 24 and beyond for inserting Unicode. But the numeric character reference support still adds some value, IMHO.
Insert a Unicode character by ISO 8879 entity name. If a prefix is
given, the character will be inserted regardless of whether or not it
has a displayable glyph; otherwise, a numeric character reference is
inserted if the codepoint is not displayable according to
xmlunicode-character-displayable
.
Pops up a menu of special characters. Configure
xmlunicode-character-menu-alist
to change the list.
You can bind a key to this function. You can also create a menu bar pulldown menu:
(define-key nxml-mode-map [menu-bar unichar]
(cons "UniChar" xmlunicode-character-menu-map))
This function provides access to all the ISO Latin 1 accented
characters. It reads two more keystrokes and composes the approprate
character that way. Configure xmlunicode-character-shortcut-alist
to
change the mappings.
for example, if this function is bound to C-t e
, I can type
C-t e e ’
to insert “é”. Or
C-t e $ y
to insert “¥”.
This function, which I bind to the (double) quotation mark key in several modes, inserts the appropriate double quote. Called after a space, newline, or “>”, it inserts a left double quote. Called after a double quote, it cycles through the three possible quote styles: left, straight, or right. Called anywhere else, it inserts a right double quote.
In nxml-mode
, inside a start tag, it always inserts just a vanilla double quote.
I bind this to the (single) quotation mark key in several moves. It does just what you think it does.
I bind this to -
in several modes. It cycles through dash, mdash, and
ndash characters. If there are already two consecutive -
preceding point,
it just inserts another -
.
I bind this to .
in several modes. It replaces three consecutive
periods with an ellipsis, …
.
I bind this to ;
in nxml-mode
. It has the following effect: if the
characters that precede the semicolon are an ampersand followed by an ISO
8879 entity name, the corresponding character is inserted.
For example, if I type ñ
and then ;
, a Unicode ñ
is inserted.
It happens that I still remember a lot of the ISO entity names.
You can’t bind this one to a key, just run it in your *scratch*
buffer. It inserts all the Unicode characters. This allows you to see
which ones will actually display correctly.
By default, it prints all the characters in the BMP.
Y #x000000 ^@ NULL Y #x000001 ^A START OF HEADING Y #x000002 ^B START OF TEXT …
The leading “Y” indicates that the character is believed to be displayable. There are optional arguments to change the range and suppress non-displayable characters.
Note: this function takes a while to run.
There is a helm-integrated version of xmlunicode-character-insert
, it is called
xmlunicode-character-insert-helm
. To use this version, you must load the
xmlunicode-helm.el
library.
A Helm version of xmlunicode-character-insert
. It supports searching
for the characters by Unicode name or ISO entity name as well as by
code point.
I bind this to “C-t u”
The unicode-to-el.py
script can be used to combine your own version
of “UnicodeData-X.Y.ZdR.txt” and ISONameList.txt
into
xmlunicode-character-list.el
.
You probably want to start with the most recent version of the Unicode character database.