The objective of this research is to upgrade/fine-tune the .traineddata
model of Google's open source OCR engine Tesseract so that
the out-of-print or archived bengali literature can be converted to a plaintext format.
Please check the Releases tab for latest updates on the status. Current approaches do not show any significant performance improvements. Experimentation is needed to enhance the performance.
Training data is obtained from converting and cleaning line-level images from this archived text.