TypeError: unsupported operand type(s) for +: 'PDFObjRef' and 'bytes'
adarsa opened this issue · 1 comments
When calling pdftotree.parse(pdf_file), i get the following error:
>>> output = pdftotree.parse('403000541.pdf')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/adarsa/ilimi/pyenv36/lib/python3.6/site-packages/pdftotree/core.py", line 63, in parse
if not extractor.is_scanned():
File "/Users/adarsa/ilimi/pyenv36/lib/python3.6/site-packages/pdftotree/TreeExtract.py", line 121, in is_scanned
self.parse()
File "/Users/adarsa/ilimi/pyenv36/lib/python3.6/site-packages/pdftotree/TreeExtract.py", line 91, in parse
for page_num, layout in enumerate(analyze_pages(self.pdf_file)):
File "/Users/adarsa/ilimi/pyenv36/lib/python3.6/site-packages/pdftotree/utils/pdf/pdf_utils.py", line 136, in analyze_pages
interpreter.process_page(page)
File "/Users/adarsa/ilimi/pyenv36/lib/python3.6/site-packages/pdfminer/pdfinterp.py", line 841, in process_page
self.render_contents(page.resources, page.contents, ctm=ctm)
File "/Users/adarsa/ilimi/pyenv36/lib/python3.6/site-packages/pdfminer/pdfinterp.py", line 852, in render_contents
self.init_resources(resources)
File "/Users/adarsa/ilimi/pyenv36/lib/python3.6/site-packages/pdfminer/pdfinterp.py", line 356, in init_resources
self.fontmap[fontid] = self.rsrcmgr.get_font(objid, spec)
File "/Users/adarsa/ilimi/pyenv36/lib/python3.6/site-packages/pdfminer/pdfinterp.py", line 204, in get_font
font = self.get_font(None, subspec)
File "/Users/adarsa/ilimi/pyenv36/lib/python3.6/site-packages/pdfminer/pdfinterp.py", line 195, in get_font
font = PDFCIDFont(self, spec)
File "/Users/adarsa/ilimi/pyenv36/lib/python3.6/site-packages/pdfminer/pdffont.py", line 641, in __init__
self.cidcoding = (self.cidsysteminfo.get('Registry', 'unknown') + b'-' +
TypeError: unsupported operand type(s) for +: 'PDFObjRef' and 'bytes'
I installed by building the package in python 3.6.
I looked for the following line of code both in pdfminer and in pdfminer.six, and found that it only appears in pdfminer.
File "/Users/adarsa/ilimi/pyenv36/lib/python3.6/site-packages/pdfminer/pdffont.py", line 641, in init
self.cidcoding = (self.cidsysteminfo.get('Registry', 'unknown') + b'-' +
This issue was reported at euske/pdfminer#258 too (the original pdfminer), which was supposedly fixed by euske/pdfminer@cc7d409.
Assuming pdfminer (instead of pdfminer.six) was used by the reporter, this issue is invalid as the pdftotree has listed pdfminer.six as a dependency at 8c190cb, way before this issue was reported, and still uses it.