PyMuPDF won't load a page from a PDF that doesn't seem to have a problem: pymupdf.mupdf.FzErrorArgument: code=4: key is not a name (dictionary)
Closed this issue ยท 8 comments
Description
Hello, I'm not very familiar with PDF manipulation, but I'm using PyMuPDF to load PDF pages with the aim of converting them to images.
Example:
import pymupdf
def to_image(doc_bytes):
doc_repr = pymupdf.open(stream=doc_bytes)
results = []
for pnum in range(doc_repr.page_count):
page = doc_repr.load_page(pnum) # <= Raised pymupdf.mupdf.FzErrorArgument: code=4: key is not a name (dictionary)
# ...
# page to image logics
# ...
return resultsSo far, I've never had any problems processing 1500 PDFs, and I came across a PDF that produces this exception: pymupdf.mupdf.FzErrorArgument: code=4: key is not a name (dictionary)
I haven't found a solution by searching the web. The PDF displays correctly in my file explorer, but with pymupdf, I get the above exception.
For privacy reasons, I cannot share the PDF, which could contain errors in its structure.
Do you have a solution or just an explanation of the potential causes of this exception or other suggestions?
Description
PyMuPDF version
1.26.1
Operating system
MacOS
Python version
3.12.4
You did not provide the PDF! We cannot do anything without a reproducer. You can use my e-mail if you have confidentiality concerns.
Got it, I've forwarded it to you.
Looking now.
This is an upstream problem: MuPDF cannot process the file. I need to involve the MuPDF team here.
Well noted. Thanks for your review ๐.
Here is the MuPDF bug link: https://bugs.ghostscript.com/show_bug.cgi?id=708605
If updating your pymupdf dependency doesn't work, you can rewrite the pdf using: ocrmypdf --skip-text input.pdf output.pdf and that file seems to load OK ๐
