XML Unicode

XML Unicode provides some convenience methods for inserting Unicode characters. When it started, the focus was on characters that were traditionally inserted with named character entities, things like é.

In practice, and in the age of UTF-8, the “insert unicode character” function, especially the Helm-enabled version, is much more broadly useful.

You’re most likely going to want to bind some or all of them to keys.

Functions

xmlunicode-character-insert

Insert a Unicode character by character name. If a prefix is given, the character will be inserted regardless of whether or not it has a displayable glyph; otherwise, a numeric character reference is inserted if the codepoint is not displayable according to xmlunicode-character-displayable.

This function is somewhat obsoleted by new methods in Emacs 24 and beyond for inserting Unicode. But the numeric character reference support still adds some value, IMHO.

xmlunicode-iso8879-character-insert

Insert a Unicode character by ISO 8879 entity name. If a prefix is given, the character will be inserted regardless of whether or not it has a displayable glyph; otherwise, a numeric character reference is inserted if the codepoint is not displayable according to xmlunicode-character-displayable.

xmlunicode-character-menu-insert

Pops up a menu of special characters. Configure xmlunicode-character-menu-alist to change the list.

You can bind a key to this function. You can also create a menu bar pulldown menu:

(define-key nxml-mode-map [menu-bar unichar]
  (cons "UniChar" xmlunicode-character-menu-map))

xmlunicode-character-shortcut-insert

This function provides access to all the ISO Latin 1 accented characters. It reads two more keystrokes and composes the approprate character that way. Configure xmlunicode-character-shortcut-alist to change the mappings.

for example, if this function is bound to C-t e, I can type C-t e e ’ to insert “é”. Or C-t e $ y to insert “¥”.

xmlunicode-smart-double-quote

This function, which I bind to the (double) quotation mark key in several modes, inserts the appropriate double quote. Called after a space, newline, or “>”, it inserts a left double quote. Called after a double quote, it cycles through the three possible quote styles: left, straight, or right. Called anywhere else, it inserts a right double quote.

In nxml-mode, inside a start tag, it always inserts just a vanilla double quote.

xmlunicode-smart-single-quote

I bind this to the (single) quotation mark key in several moves. It does just what you think it does.

xmlunicode-smart-hyphen

I bind this to - in several modes. It cycles through dash, mdash, and ndash characters. If there are already two consecutive - preceding point, it just inserts another -.

xmlunicode-smart-period

I bind this to . in several modes. It replaces three consecutive periods with an ellipsis, ….

xmlunicode-smart-semicolon

I bind this to ; in nxml-mode. It has the following effect: if the characters that precede the semicolon are an ampersand followed by an ISO 8879 entity name, the corresponding character is inserted.

For example, if I type &ntilde and then ;, a Unicode ñ is inserted.

It happens that I still remember a lot of the ISO entity names.

xmlunicode-show-character-list

You can’t bind this one to a key, just run it in your *scratch* buffer. It inserts all the Unicode characters. This allows you to see which ones will actually display correctly.

By default, it prints all the characters in the BMP.

Y #x000000 ^@ NULL
Y #x000001 ^A START OF HEADING
Y #x000002 ^B START OF TEXT
…

The leading “Y” indicates that the character is believed to be displayable. There are optional arguments to change the range and suppress non-displayable characters.

Note: this function takes a while to run.

Helm functions

There is a helm-integrated version of xmlunicode-character-insert, it is called xmlunicode-character-insert-helm. To use this version, you must load the xmlunicode-helm.el library.

xmlunicode-character-insert-helm

A Helm version of xmlunicode-character-insert. It supports searching for the characters by Unicode name or ISO entity name as well as by code point.

I bind this to “C-t u”

Building your own character list

The unicode-to-el.py script can be used to combine your own version of “UnicodeData-X.Y.ZdR.txt” and ISONameList.txt into xmlunicode-character-list.el.

You probably want to start with the most recent version of the Unicode character database.

ndw/xmlunicode