lichray/nvi2

Use character classes as word boundaries

bentley opened this issue · 1 comments

In current nvi, ‘w’ is not that useful in Japanese text, because it considers a long sentence written without spaces as a single word.

In the old nvi-m17n, ‘w’ moved along katakana/hiragana/kanji boundaries (so, for instance, “本日は晴天なり” would be treated as words “本日” “は” “晴天” “なり”). This is also how Xterm handles word selection; see Xterm’s charclass.c. I’ve been told by a Japanese nvi user that this is behavior he misses from nvi-m17n.

It would be useful to break along character class boundaries like this.

Yes. And by relying on the Unicode information, we can also eliminate the platform differences on locale. Currently we have some code to get the Unicode codepoint

int uc = -1;
, but only used to display Unicode escape sequences.