jstockwin/py-pdf-parser

A Python tool to help extracting information from structured PDFs.

PythonMIT

Issues

AttributeError: 'NoneType' object has no attribute 'encode' with load_file
#390 opened 7 months ago by umaplehurst
0
Does not install on Python 3.11+
#381 opened a year ago by atompkins
1
Py PDF Parser tests are distributed in PyPi wheels
#383 opened 8 months ago by aiden2480
1
ValueError on empty ElementList.after()
#386 opened a year ago by aiden2480
2
Add custom filter predicate and header/footer filters
#384 opened 9 months ago by aiden2480
0
How to open encrypted files by passing the password?
#349 opened a year ago by dantehemerson
0
Switch to trusted publishing on PyPI
#370 opened 2 years ago by jstockwin
0
Standardizing contains and equals
#373 opened 2 years ago by AndersWoodruff
1
Filter text ignoring case
#371 opened 2 years ago by ARandomPerson07
2
Unable to create an PDFDocument object
#369 opened 2 years ago by papstchaka
2
How to run the lint process?
#351 opened 2 years ago by dantehemerson
1
When will be there a new release to depend on wand 0.6.10?
#344 opened 2 years ago by kiryph
2
ElementList filter on visualise function does not work
#255 opened 3 years ago by mcrts
1
Document regular expression font mapping
#237 opened 4 years ago by Aceto1
9
extract_table ignores ordering defined while loading the document
#153 opened 4 years ago by paulopaixaoamaral
1
Add more tests for the visualise tool
#219 opened 4 years ago by jstockwin
0
keep getting an error when trying to visualise
#204 opened 4 years ago by rannndom
9
Use of Visualize
#122 opened 4 years ago by dpieski
2
Release v0.8.0?
#200 opened 4 years ago by AldenPeterson
3
Element extraction in original order
#190 opened 4 years ago by zheyaf
2
[loaders] Loads accept LTTextLines as top level pdfminer elements, which breaks things
#154 opened 4 years ago by jstockwin
0
Unable to install with pip3
#123 opened 4 years ago by chookity-pokk
3
Ensure CI runs on PRs
#118 opened 5 years ago by jstockwin
0
Some suggestions to enhance locating logics -'offset()' and 'resize()'
#111 opened 5 years ago by forhonourlx
9
Consider using sorted sets?
#114 opened 5 years ago by jstockwin
1
Update when to use py-pdf-parser documentation
#108 opened 5 years ago by jstockwin
0
Add `remove_duplicate_header_rows` flag to a documentation example
#97 opened 5 years ago by jstockwin
0
What is the prefix:'PCAGML' of font:"PCAGML+SourceHanSerifCN-Regular,16.0"?
#100 opened 5 years ago by forhonourlx
5
Too large a tolerance causes an error
#102 opened 5 years ago by jstockwin
0
Add code coverage checks to CI
#104 opened 5 years ago by jstockwin
0
Include text which is within figures
#98 opened 5 years ago by jstockwin
0
Allow different element orderings
#94 opened 5 years ago by jstockwin
0
Finish the info screen on visualise tool
#93 opened 5 years ago by jstockwin
0
Use LTChar.size to extract the font size
#92 opened 5 years ago by jstockwin
0
Add __repr__ to section class
#63 opened 5 years ago by jstockwin
0
Add feature to remove duplicate header rows
#76 opened 5 years ago by jstockwin
2
`create_section` should throw a better error if it isn't passed a `PDFElement`
#75 opened 5 years ago by jstockwin
4
[performance] Disable advanced layout analysis
#50 opened 5 years ago by jstockwin
1
Section visualisations can be made simpler in some cases
#72 opened 5 years ago by jstockwin
0
Publish to PyPI
#79 opened 5 years ago by jstockwin
0
[tests] Create some tests which use real PDFs
#56 opened 5 years ago by jstockwin
1
Add some examples to the documentation
#48 opened 5 years ago by jstockwin
0
Filtering by fonts is broken
#77 opened 5 years ago by jstockwin
0
Cache filtering by font
#64 opened 5 years ago by jstockwin
3
Add license, contributing, gh template, and changelog files
#49 opened 5 years ago by jstockwin
0
Better visualisations of sections
#66 opened 5 years ago by jstockwin
0
Extract simple table could be more efficient
#62 opened 5 years ago by jstockwin
0
Change font sizes to floats
#59 opened 5 years ago by jstockwin
0
Allow some gaps in the table for extract_simple_table
#58 opened 5 years ago by jstockwin
0
Run tests on GitHub Actions
#47 opened 5 years ago by jstockwin
0