Keep heif_opener registered after mp spawn
yit-b opened this issue · 3 comments
Describe why it is important and where it will be useful
When decoding images with a torch dataloader (or more generally a multiprocessing pool) and mp/torch start_method = "spawn", needing to register the heif opener per-process (e.g. in an initializer or worker_init_fn) is a bit of a gotcha. Calling register_heif_opener() in the global scope of your program is not enough as the plugin gets unregistered after the spawn.
Repro:
import io
from functools import partial
from typing import List
from PIL import Image
from torch import multiprocessing as mp
from torchvision.transforms import v2
from torchvision.transforms.functional import pil_to_tensor
sample_image_transform = v2.Compose(
[
io.BytesIO,
Image.open,
partial(Image.Image.convert, mode="RGB"),
pil_to_tensor,
]
)
def heif_init():
from pillow_heif import register_heif_opener
register_heif_opener()
def test_register_heif_once(image_bytes: List[bytes]):
heif_init()
with mp.Pool(1) as pool:
pool.map(sample_image_transform, image_bytes)
def test_register_heif_once_per_process(image_bytes: List[bytes]):
with mp.Pool(1, initializer=heif_init) as pool:
pool.map(sample_image_transform, image_bytes)
def main():
heif_paths = [...]
heif_images = [open(p, "rb").read() for p in heif_paths]
try:
test_register_heif_once(heif_images)
print(f"test_register_heif_once() success")
except Exception as e:
print(f"test_register_heif_once() failed: {e}")
pass
try:
test_register_heif_once_per_process(heif_images)
print(f"test_register_heif_once_per_process() success")
except Exception as e:
print(f"test_register_heif_once_per_process() failed: {e}")
pass
if __name__ == "__main__":
mp.set_start_method("spawn", force=True)
main()
Output:
test_register_heif_once() failed: cannot identify image file <_io.BytesIO object at 0x7f5dc3777d80>
test_register_heif_once_per_process() success
Describe your proposed solution
I'm not sure how you'd do this - open to discussion.
Describe alternatives you've considered, if relevant
If I explicitly call register_heif_opener() in the initializer of my mp pools or torch dataloaders, then there's no issue. But that's a bit easy to forget and causes difficult-to-debug errors.
I'm not sure how to persist imports after a spawn but I believe some libraries e.g. torch do it somehow.
Additional context
No response
Good time of day.
But you already provide a correct way how to do this with
def heif_init():
from pillow_heif import register_heif_opener
register_heif_opener()
with mp.Pool(1, initializer=heif_init) as pool:
pool.map(sample_image_transform, image_bytes)
The same way it is done in FastAPI applications:
@asynccontextmanager
async def lifespan(app: FastAPI): # code executes in each subprocess of webserver
register_heif_opener()
yield
Pillow itself requires for each subprocess to register plugins, it does not have automatic plugin registration for security reasons.
Thanks for the quick response and clarification for the behavior. Will proceed with the per-subprocess initialization technique.
You're welcome, always happy to help