/pdftoedn

PDF processing tool to extract document data and save it in EDN format

Primary LanguageHTMLGNU General Public License v3.0GPL-3.0

pdftoedn

A poppler-based PDF processing tool to extract document data and save it in EDN format. It supports:

  • Font and glyph remapping via user-defined font map configurations (in JSON format) to allow glyph substitutions for Type 1 or TT fonts with invalid/incorrect unicode tables and even embedded CID fonts with missing tables.
  • Path data extraction.
  • Transformed image output, written directly to disk in PNG format.
  • Annotations.
  • PDF outlines.

Usage

Process a pdf document and write its output to output_file.edn:

pdftoedn -o output_file.edn input_file.pdf

Further reading

Refer to the wiki for