[zotero] issue with item.num_pages
andreifoldes opened this issue · 1 comments
andreifoldes commented
Hello,
Every now and again I get the following error message during my ingestation process. Maybe something is wrong with the pdf?
i=0
for item in zotero.iterate(start=129,limit=900):
i+=1
print("Adding", item.title, i)
if item.num_pages > 30:
continue # skip long papers
docs.add(item.pdf, docname=item.key)
output:
Traceback (most recent call last):
Cell In[52], line 1
for item in zotero.iterate(start=129,limit=900):
File ~/anaconda3/envs/paperqa/lib/python3.11/site-packages/paperqa/contrib/zotero.py:257 in iterate
num_pages=count_pdf_pages(pdf),
File ~/anaconda3/envs/paperqa/lib/python3.11/site-packages/paperqa/utils.py:66 in count_pdf_pages
num_pages = len(pdf_reader.pages)
File ~/anaconda3/envs/paperqa/lib/python3.11/site-packages/pypdf/_page.py:2435 in __len__
return self.length_function()
File ~/anaconda3/envs/paperqa/lib/python3.11/site-packages/pypdf/_reader.py:456 in _get_num_pages
self._flatten()
File ~/anaconda3/envs/paperqa/lib/python3.11/site-packages/pypdf/_reader.py:1213 in _flatten
catalog = self.trailer[TK.ROOT].get_object()
File ~/anaconda3/envs/paperqa/lib/python3.11/site-packages/pypdf/generic/_data_structures.py:309 in __getitem__
return dict.__getitem__(self, key).get_object()
KeyError: '/Root'
jamesbraza commented
This looks to be a problem with https://github.com/py-pdf/pypdf. We just released version 5, which rewrites a lot of stuff and updates our dependencies. We actually no longer depend on pypdf
, instead we use pymupdf
.
As this issue is no longer relevant in the latest paper-qa
, I am going to close this issue out. If your issue persists, please reopen a new issue using paper-qa>=5