/pdf-layout-text-stripper

Converts a pdf file into a text file while keeping the layout of the original pdf.

Primary LanguageJava

PDFLayoutTextStripper as docker container command-line utility

license Code Climate Issue Count

Converts a PDF file into a text file while keeping the layout of the original PDF. Useful to extract the content from a table or a form in a PDF file. PDFLayoutTextStripper is a subclass of PDFTextStripper class (from the Apache PDFBox library).

  • Use cases
  • How to use

Use cases

Data extraction from a table in a PDF file example

Data extraction from a form in a PDF file example

How to use

# i do it myself
docker build -t pdf-layout-text-stripper .
docker run -v $(pwd):/app pdf-layout-text-stripper "sample.pdf"

# i'm lazy
docker run -v $(pwd):/app madnight/pdf-layout-text-stripper "sample.pdf"