A Python + C implementation for image-based PDF page layout analysis and content extraction.
Primary LanguageC++Apache License 2.0Apache-2.0