/text-cleanup

Heuristics for fixing up typical OCR errors in text

Primary LanguagePython

Text Cleanup Build Status

Attempt to fix common errors in OCR-scanned text.

Known Issues:

  • Assumes UTF-8 input, even for properly annotated XML files.