kanzure/pdfparanoia

pdfminer API has changed

apoelstra opened this issue · 0 comments

If you run with latest pdfminer, pdfparanoia bombs out with

Traceback (most recent call last):
  File "/bin/pdfparanoia", line 38, in <module>
    outputcontent = pdfparanoia.scrub(StringIO(Args.in_pdf.read()), verbose=verbose)
  File "/usr/lib/python2.7/site-packages/pdfparanoia/core.py", line 53, in scrub
    content = plugin.scrub(content, verbose=verbose)
  File "/usr/lib/python2.7/site-packages/pdfparanoia/plugins/aip.py", line 25, in scrub
    pdf = parse_content(content)
  File "/usr/lib/python2.7/site-packages/pdfparanoia/parser.py", line 46, in parse_content
    return parse_pdf(stream)
  File "/usr/lib/python2.7/site-packages/pdfparanoia/parser.py", line 31, in parse_pdf
    doc = pdfminer.pdfparser.PDFDocument()
AttributeError: 'module' object has no attribute 'PDFDocument'

As suggested in timClicks/slate#5 you can work around this by using an old pdfminer by

pip install --upgrade --ignore-installed slate==0.3 pdfminer==20110515