/PDFScraping

Dump annotations from journal PDFs to markdown and images

Primary LanguageF#MIT LicenseMIT

PDFScraping

This tool dumps annotations (highlights, textboxes, popups, strikeouts, underlines) from journal PDFs, and create images from rectangle annotations.

This project is inspired by pdfannots project.

Supported

This project is still in proof-of-concept stage. Awful coding style and bugs are normal and expected.

Currently, I used two PDF libraries because PdfPig does not support PDF rendering and processing metadata with PDFiumCore is very painful.

As the project now uses System.Drawing, it does not support non-windows platforms.

  • Highlight
  • Highlight note
  • FreeText
  • Popup
  • Strikeout
  • Underline
  • Rectangle
  • Metadata from pdf (title, doi)
  • Metadata from XMP (title, doi)
  • Bookmarks
  • Citations

Example

sample