/pdf-paragraph-segmentation

This is a paragraph detection & segmentation tool for PDF documents.

Primary LanguageJupyter Notebook

pdf-paragraph-segmentation

This is a paragraph detection & segmentation tool for PDF documents. Better for those with rules and policies enumerated in the paragraph.

Package Requirement

  • PDFMiner
  • NLTK
  • Regular Expression

Sample PDF Document

Result for hightlight target