/cvs2unicode

An 8-bit to Unicode converter with automatic codepage detection for CVS

Primary LanguageJavaGNU General Public License v3.0GPL-3.0

cvs2unicode, an 8-bit to Unicode converter with automatic codepage detection for CVS

Build Status

Please follow this link for project documentation.

Features

  • Support for :local: CVS protocol
  • Automatic detection of text/binary contents based on CVS substitution modes. Only text fiels are converted.
  • Support for all standard Cyrillic charsets (KOI8-R, windows-1251, ISO-8859-5, IBM866). Files are read line-by-line, so a single versioned file can have differently encoded lines (which is usually the case if the file was re-encoded between revisions).
  • Support for double-encoded files (e.g: KOI-windows-KOI (KWK), Unicode-KOI-Unicode (UKU)). The idea is borrowed from Andrzej Novosiolov, who originally added this feature to the FAR Manager.
  • Automatic codepage detection if external dictionary (aspell/ispell/myspell/hunspell) is available.
  • Integration with Hunspell via HunspellBridJ to spell different forms of the same word. Automatic codepage detection works best with Hunspell (Russian and/or Ukrainian dictionaries should be installed so that Hunspell can find them).
  • Support for interactive charset selection where existing dictionary is not enough (e.g. for misspelled words). New words are automatically added to the user dictionary.
  • Vim integration. If you're unsure which charset to pick, you can jump to exactly the same line with Vim, and examine the context.

Running

mvn exec:java

Screenshots

Main window Interactive disambiguation