Acquil/deep-read

OCR: Remove noise from OCR text

Closed this issue · 0 comments

Extraction of data from slides using Tesseract.
To do:

  1. Remove noise from image for better OCR extraction.
  2. Preprocess text extracted from OCR(cleaning, spell checking etc)
  3. Formation of proper sentences that are suitable for summarization phase.