unidoc/unipdf

[BUG] Huge memory consumption when writing images to PDF

zenyui opened this issue · 8 comments

zenyui commented

Description

I am trying to create a PDF from an array of golang image.Image objects. The images are about ~30MB together, and when I write them to the PDF, I observe the docker container spike to 1.4GB memory usage.

In production, this is causing my container to OOM and exit.

See implementation below.

Expected Behavior

I would expect the memory usage to be close to (or 2x, 3x) the size of the images, not 1.4GB! I also don't see a way to incrementally build/finalize the PDF, so I don't see a way to decrease the memory usage.

Actual Behavior

Memory usage is 1.4GB, and I don't see an avenue to accomplish what I'm hoping to do.

Attachments

// pdfFromGoImages creates a pdf from an array of images, each on a separate page
func pdfFromGoImages(ctx context.Context, images ...image.Image) (io.ReadSeeker, error) {
	c := creator.New()

	margins := float64(10)

	for ix, img := range images {
		pImg, err := c.NewImageFromGoImage(img)
		if err != nil {
			return nil, err
		}
		_ = c.NewPage()

		// scale to page width
		pImg.ScaleToWidth(c.Width() - margins*2)
		pImg.SetPos(margins, margins)
		if pImg.Height() >= c.Height() {
			pImg.ScaleToHeight(c.Height() - margins*2)
			pImg.SetPos(margins, margins)
		}
		b := creator.NewBlock(1, 1)
		if err := b.Draw(pImg); err != nil {
			return nil, err
		}
		if err := c.Draw(b); err != nil {
			return nil, err
		}

	}

	var outBytes bytes.Buffer
	writer := bufio.NewWriter(&outBytes)
	if err := c.Write(writer); err != nil {
		return nil, err
	}

	return bytes.NewReader(outBytes.Bytes()), nil
}

Welcome! Thanks for posting your first issue. The way things work here is that while customer issues are prioritized, other issues go into our backlog where they are assessed and fitted into the roadmap when suitable. If you need to get this done, consider buying a license which also enables you to use it in your commercial products. More information can be found on https://unidoc.io/

zenyui commented

FYI, I am a licensed enterprise customer

Hi @zenyui,

Could you share the images that you load into golang image.Image object? so we can reproduce the issue in our ends

zenyui commented

Here is a google drive folder with a few pprof dumps and the source PDF.

The larger algorithm is:

  1. extract the images from the source pdf
  2. convert to golang image.Image and compress it to 75% quality (attempt to make it smaller)
  3. pass into above function to write images to a new PDF

Here is a google drive folder with a few pprof dumps and the source PDF.

The larger algorithm is:

  1. extract the images from the source pdf
  2. convert to golang image.Image and compress it to 75% quality (attempt to make it smaller)
  3. pass into above function to write images to a new PDF

Thanks for the information, we will investigate this issue.

Still waiting on a solution.

@zenyui
We have already improved partly PDF creation from images and introduced lazy mode allowing us to reduce memory consumption.
you can check it here:
https://github.com/unidoc/unipdf-examples/blob/master/image/pdf_images_to_pdf_lazy.go

As for image extraction, we are actively working on that and and we will keep you updated on our progress.