bigcat88/pillow_heif

Keep heif_opener registered after mp spawn

yit-b opened this issue · 3 comments

Describe why it is important and where it will be useful

When decoding images with a torch dataloader (or more generally a multiprocessing pool) and mp/torch start_method = "spawn", needing to register the heif opener per-process (e.g. in an initializer or worker_init_fn) is a bit of a gotcha. Calling register_heif_opener() in the global scope of your program is not enough as the plugin gets unregistered after the spawn.

Repro:

import io
from functools import partial
from typing import List

from PIL import Image
from torch import multiprocessing as mp
from torchvision.transforms import v2
from torchvision.transforms.functional import pil_to_tensor

sample_image_transform = v2.Compose(
    [
        io.BytesIO,
        Image.open,
        partial(Image.Image.convert, mode="RGB"),
        pil_to_tensor,
    ]
)


def heif_init():
    from pillow_heif import register_heif_opener
    register_heif_opener()

def test_register_heif_once(image_bytes: List[bytes]):
    heif_init()

    with mp.Pool(1) as pool:
        pool.map(sample_image_transform, image_bytes)

def test_register_heif_once_per_process(image_bytes: List[bytes]):
    with mp.Pool(1, initializer=heif_init) as pool:
        pool.map(sample_image_transform, image_bytes)
        
def main():
    heif_paths = [...]
    heif_images = [open(p, "rb").read() for p in heif_paths]
    try:
        test_register_heif_once(heif_images)
        print(f"test_register_heif_once() success")
    except Exception as e:
        print(f"test_register_heif_once() failed: {e}")
        pass

    try:
        test_register_heif_once_per_process(heif_images)
        print(f"test_register_heif_once_per_process() success")
    except Exception as e:
        print(f"test_register_heif_once_per_process() failed: {e}")
        pass

if __name__ == "__main__":
    mp.set_start_method("spawn", force=True)
    main()

Output:

test_register_heif_once() failed: cannot identify image file <_io.BytesIO object at 0x7f5dc3777d80>
test_register_heif_once_per_process() success

Describe your proposed solution

I'm not sure how you'd do this - open to discussion.

Describe alternatives you've considered, if relevant

If I explicitly call register_heif_opener() in the initializer of my mp pools or torch dataloaders, then there's no issue. But that's a bit easy to forget and causes difficult-to-debug errors.

I'm not sure how to persist imports after a spawn but I believe some libraries e.g. torch do it somehow.

Additional context

No response

Good time of day.
But you already provide a correct way how to do this with

def heif_init():
    from pillow_heif import register_heif_opener
    register_heif_opener()
    
    
with mp.Pool(1, initializer=heif_init) as pool:
    pool.map(sample_image_transform, image_bytes)

The same way it is done in FastAPI applications:

@asynccontextmanager
async def lifespan(app: FastAPI):  # code executes in each subprocess of webserver
    register_heif_opener()
    yield

Pillow itself requires for each subprocess to register plugins, it does not have automatic plugin registration for security reasons.

Thanks for the quick response and clarification for the behavior. Will proceed with the per-subprocess initialization technique.

You're welcome, always happy to help