A command line Java application for adding text to image-only PDFs.
Very WIP. Currently renders each word individually at a fixed text size.
Options:
-i
, --inputFile
- the input PDF document
-o
, --outputFile
- the file to output the new PDF document
-t
, --trainingData
- the path to the Tesseract training data directory
-l
, --language
- the language to use for OCR (optional, defaults to 'eng')
--visibleText
- make the text visible (by default it is selectable but invisible)
- Tesseract OCR - optical character recognition library
- Bytedeco javacpp-presets - Java bindings for Tesseract
- Apache PDFBox - PDF content extraction and editing
- jopt-simple - CLI argument parsing