pymupdf/PyMuPDF

CMYK extract_image has slightly wrong colors

Closed this issue · 2 comments

axu2 commented

Description of the bug

The colors in the final output are quite off what I see in my normal PDF viewer.

And they don't matched what I would get from a

mutool extract -r Seven.Deadly.Sins.Program-1.pdf

PyMuPDF on left, mutool extract on right.

Image

How to reproduce the bug

Using the test file linked in:

doc = pymupdf.open('Seven.Deadly.Sins.Program-1.pdf')
d = doc.extract_image(44)
imgout = open(f"image.{d['ext']}", "wb")
imgout.write(d["image"])

PyMuPDF version

1.26.3

Operating system

MacOS

Python version

3.13

Your approach directly saves the images as a copy of how it was stored in the PDF by the creator. No interference of (Py-) MuPDF has happened in this case. So we cannot take responsibility for the output.
To ensure that the base MuPDF has had a chance to process the image binary, use either the images as they are delivered by e.g. page.get_text("dict") or do Pixmap(doc, xref) as you mention.

import pymupdf, pathlib

doc = pymupdf.open("test.pdf")
page = doc[0]
blocks = [
    b
    for b in page.get_text("dict", clip=pymupdf.INFINITE_RECT())["blocks"]
    if b["type"] == 1
]

# both of the following loops will work
for i, img in enumerate(blocks):
    pathlib.Path(f"page-{page.number}-{i}.{img['ext']}").write_bytes(img["image"])

for img in page.get_images():
    xref = img[0]
    pix = pymupdf.Pixmap(doc, xref)
    pix.save(f"xref-{xref}.jpg")
axu2 commented

@JorjMcKie

I see, but PyMuPDF inverts the image, doesn't it after that recent change linked? I guess just inversion doesn't do what I expect.

But I kind of feel like something that's completely wrong before the inversion might be preferable to something that subtly wrong like how it is now.

The old behavior matched mutool extract without -r (I think?)