Use character classes as word boundaries

In current nvi, ‘w’ is not that useful in Japanese text, because it considers a long sentence written without spaces as a single word.

In the old nvi-m17n, ‘w’ moved along katakana/hiragana/kanji boundaries (so, for instance, “本日は晴天なり” would be treated as words “本日” “は” “晴天” “なり”). This is also how Xterm handles word selection; see Xterm’s charclass.c. I’ve been told by a Japanese nvi user that this is behavior he misses from nvi-m17n.

It would be useful to break along character class boundaries like this.

Yes. And by relying on the Unicode information, we can also eliminate the platform differences on locale. Currently we have some code to get the Unicode codepoint

nvi2/common/key.c

Line 285 in 1d22313

int uc = -1;

, but only used to display Unicode escape sequences.