Extract text from 'glimpsing' a pdf.
The idea is to call this code as a subprocess in e.g. python for machine learning purposes.
The project is built using CMake version >= 3.9.3. A few CMake scripts should give enough information regarding what is missing.
Ghostscript, libpng, tesseract-ocr (don't forget the language data files found externally).
Set build test ON in the project CMake file. Thereafter, building again should download googletest gtest/gmock. The tests are based on these libraries.
Feel free to contributing. As for now, the only requirements for contributing is using the same clang-format.
- August von Hacht - Initial work - vonhachtaugust
See also the list of contributors who participated in this project.
This project is licensed under the MIT License - see the LICENSE.md file for details