jcushman/pdfquery

'PDFObjRef' object has no attribute '__getitem__'

Closed this issue · 1 comments

Hello,

I'm trying to parse some pdf files using pdfquery and it seems that for a couple of pdf's(not all of them) I receive the following error:

File "my_path/my_script.py", line 244, in set_description pdf.load()
  File "/my_path/.virtualenvs/dev/local/lib/python2.7/site-packages/pdfquery/pdfquery.py", line 373, in load
    self.tree = self.get_tree(*_flatten(page_numbers))
  File "/my_path/.virtualenvs/dev/local/lib/python2.7/site-packages/pdfquery/pdfquery.py", line 475, in get_tree
    for n, page in pages:
  File "/my_path/.virtualenvs/dev/local/lib/python2.7/site-packages/pdfquery/pdfquery.py", line 596, in <genexpr>
    return (self.get_layout(page) for page in self._cached_pages())
  File "/my_path/.virtualenvs/dev/local/lib/python2.7/site-packages/pdfquery/pdfquery.py", line 591, in get_layout
    layout = self._add_annots(layout, page.annots)
  File "/my_path/.virtualenvs/dev/local/lib/python2.7/site-packages/pdfquery/pdfquery.py", line 639, in _add_annots
    annot['URI'] = annot['A']['URI']
TypeError: 'PDFObjRef' object has no attribute '__getitem__'

Below is a list with just a couple of pdf's that raises the above error:
http://www.genomecanada.ca/medias/pdf/en/genomesciencescentrebc.pdf
http://www.genomecanada.ca/medias/pdf/fr/genomesciencescentrebc.pdf
http://www.genomecanada.ca/medias/pdf/en/universityvictoria.pdf
http://www.genomecanada.ca/medias/pdf/fr/universityvictoria.pdf
http://www.genomecanada.ca/medias/pdf/fr/centreforappliedgenomicsogi.pdf

Maybe someone will be able to find a fix for it?

Thanks!

Thanks for the report. This is fixed in v.0.4.1.