Belval/pdf2image

cannot identify image file using pdf2image.convert_from_bytes

alistairwgillespie opened this issue · 1 comments

Hi,

I'm using AWS Lambda to run pipelines that consume PDF documents.

When attempting to optimize memory allocation forpdf2image.convert_from_bytes using context management and an output_folder, I get the following error:
`cannot identify image file '/tmp/tmprz6rwu8a/a606ca84-e027-4d88-88aa-6d25099a9776-18.ppm'

My code looks like so:

  pil_images=None
  images=None
  with tempfile.TemporaryDirectory() as tmpdir:
      pil_images = pdf2image.convert_from_bytes(
          document_bytes,
          dpi=dpi,
          output_folder=tmpdir
      )
      pil_images = [rsz(i, resize) for i in pil_images]
      images = [image_to_bytes(i, fmt) for i in pil_images] 
  ...

Any help is much appreciated.

Does this happen with a specific PDF file?