CMYK extract_image has slightly wrong colors
Closed this issue · 2 comments
Description of the bug
The colors in the final output are quite off what I see in my normal PDF viewer.
And they don't matched what I would get from a
mutool extract -r Seven.Deadly.Sins.Program-1.pdf
PyMuPDF on left, mutool extract on right.
How to reproduce the bug
Using the test file linked in:
doc = pymupdf.open('Seven.Deadly.Sins.Program-1.pdf')
d = doc.extract_image(44)
imgout = open(f"image.{d['ext']}", "wb")
imgout.write(d["image"])PyMuPDF version
1.26.3
Operating system
MacOS
Python version
3.13
Your approach directly saves the images as a copy of how it was stored in the PDF by the creator. No interference of (Py-) MuPDF has happened in this case. So we cannot take responsibility for the output.
To ensure that the base MuPDF has had a chance to process the image binary, use either the images as they are delivered by e.g. page.get_text("dict") or do Pixmap(doc, xref) as you mention.
import pymupdf, pathlib
doc = pymupdf.open("test.pdf")
page = doc[0]
blocks = [
b
for b in page.get_text("dict", clip=pymupdf.INFINITE_RECT())["blocks"]
if b["type"] == 1
]
# both of the following loops will work
for i, img in enumerate(blocks):
pathlib.Path(f"page-{page.number}-{i}.{img['ext']}").write_bytes(img["image"])
for img in page.get_images():
xref = img[0]
pix = pymupdf.Pixmap(doc, xref)
pix.save(f"xref-{xref}.jpg")I see, but PyMuPDF inverts the image, doesn't it after that recent change linked? I guess just inversion doesn't do what I expect.
But I kind of feel like something that's completely wrong before the inversion might be preferable to something that subtly wrong like how it is now.
The old behavior matched mutool extract without -r (I think?)