lichray/nvi2

Truncation

bentley opened this issue · 8 comments

The truncation behavior for characters invalid in the current encoding is... unfriendly. This is a regression from 1.79, where non-ASCII bytes are visually escaped and don't cause weird behavior.

  • Files with a bad character should not be truncated. Currently on write, the file is truncated at the first line with a bad character. This is really unpleasant, especially since there's no way to undo it!
  • Lines with a bad character should not interfere with searching, scrolling, etc. They should be displayed (with proper escapes) and not shown blank on the screen.

I know.

This definitely should be improved, but to display escapes is not possible. It's highly problematic to preserve a malformed wide char.

Vim uses some replacement mechanism, it might worth to look at.

Another thing might help is an option to switch back to byte-oriented mode when editing. I tried to implement one but I met a bug that I (and 1.8x author) had no idea how to debug. I should try it again.

I tried Traditional Vi today and was surprised to see it has the exact behavior I want. Maybe worth looking at in comparison to Vim... http://ex-vi.sourceforge.net/

I guess it's because Traditional Vi is UTF-8 only (it does not use iconv(3), right?). When the encoding is known, you can embed malformed stuff in decoded character (Unicode code point is far less than what int32 can represent). But nvi2 uses locale to decode to wchar_t, so the actual representations varies. I'll take a look at its code when I got time anyway.

This email is pretty interesting:

http://lists.suckless.org/dev/1312/18786.html

The first issue should has been solved by https://github.com/lichray/nvi2/tree/bypass-conv-on-write , please double check.

Yes, this prevents the truncation. Thanks, and sorry for taking so long to check.

This approach has some issue, but I forgot what that is... Hope I can have some time to look at it during the year-end.

Added:

Missing some db1-related error handling. Other parts may also need the change.

One thought from @delphij: ask the user to reload the file in 8-bit mode when such an error occurs.