Hopding/pdf-lib

Links are lost after combining PDFs

vekunz opened this issue · 8 comments

Hi, I use pdf-lib to combine multiple PDFs. One of the PDFs has links in it, like a table of contents. The links direct to other pages of the same PDF. the problem is that these links are lost after combining PDFs with pdf-lib.
Is there a way to preserve the links?

My code:

const pdfDoc = await PDFDocument.create();
for (const file of files) {
    const indices = [];
    for (let i = 0; i < file.getPageCount(); i++)
        indices.push(i);
    const pages = await pdfDoc.copyPages(file, indices);

   for (const page of pages) {
        pdfDoc.addPage(page);
    }
}

Edit: I found out that the links are saved as "Named Destinations" in the PDF. The PDF has Version 1.4. One option would be that I add the destinations after merging, but then I need an option to add these to the pdf manually.

Hello @vekunz!

As you noted, the links do not work after the pages are merged because the links reference Named Destinations. Named Destinations are stored under the /Dests entry of the document's catalog. Unfortunately, the current page copying code does not copy anything from the donor document that isn't accessible from the page via a chain of indirect references. And most of the resources listed under the catalog are not accessible in this way.

This limitation has come up before in #159 and #218. I would like to see this issue resolved, but haven't had any time to work on it. I'd be open to discussing a solution to anybody interested in implementing a fix for copying catalog entries between documents!

Hi @Hopding,

Thanks for your great work!

I'd like to support you in copying catalog entries between documents. I'm new to PDFs internal workings but am a quick learner. I started researching the format and feel like I've got a good overview.

Since you know about pdf-lib best, do you have any suggestions for implementing this feature? My first (uneducated) guess would be:

  1. Find the catalog entries in the original document.
  2. Copy all catalog entries related to links to the new document.

Added this to the roadmap for tracking: #998.

Wonderful lib! Know this old issue and closed, but links still do not work on merge in latest release. Hoping for support in future.

#1609

any updates for internal link to work?

This is how i post process multiple documents after using copy pages.

import { PDFArray, PDFDict, PDFDocument, PDFName, PDFRef } from 'pdf-lib';

function getLinksPDFName(): PDFName {
  return PDFName.of('Dests');
}

function mapSourceToTargetPages(
  sources: PDFDocument[],
  destination: PDFDocument,
): Record<string, PDFRef> {
  const result = {};
  const sourcePages = sources.flatMap(source => source.getPages());
  const destinationPages = destination.getPages();
  for (let i = 0; i < sourcePages.length; i++) {
    result[sourcePages[i].ref.tag] = destinationPages[i].ref;
  }
  return result;
}

export function copyLinks(sources: PDFDocument[], target: PDFDocument): void {
  const targetLinksDict = PDFDict.withContext(target.context);
  sources
    .map(source => source.context.lookupMaybe(source.catalog.get(getLinksPDFName()), PDFDict))
    .filter(links => links != null)
    .forEach(links =>
      links.entries().forEach(([destName, destValue]) => targetLinksDict.set(destName, destValue)),
    );
  const pagesMapping = mapSourceToTargetPages(sources, target);
  (targetLinksDict.values() as PDFArray[]).forEach(array => {
    const currentPageRef = array.get(0) as PDFRef;
    array.set(0, pagesMapping[currentPageRef.tag]);
  });

  const destinationDestsRef = target.context.register(targetLinksDict);
  target.catalog.set(getLinksPDFName(), destinationDestsRef);
}

How it works:

  • copy entries from Dests from all sources to new dictionary
  • fix references (links) in target Dests because copied pages have different PDFRef
  • register dictionary in the target's context
  • set Dests reference in target's catalog to dictionary

Thanks for sharing @Ludevik. This has been very helpful. One issue I did encounter is that my sources seem to have duplicate ref.tag values between them. Creating a single mapping of all the source documents to the target was overwriting the duplicates, so I refactored to only map one source document at a time.

export function copyLinks(sources: PDFDocument[], target: PDFDocument) {
  const targetLinksDict = PDFDict.withContext(target.context);

  let currentTargetPage = 0;
  for (const source of sources) {
    const { mapping, targetPage } = mapSourceToTargetPages(source, target, currentTargetPage);
    currentTargetPage = targetPage;

    const links = source.context.lookupMaybe(source.catalog.get(LINKS_PDF_NAME), PDFDict);
    if (links !== null) {
      links?.entries().forEach(([destName, destValue]) => {
        const currentRef = (destValue as PDFArray).get(0) as PDFRef;
        (destValue as PDFArray).set(0, mapping[currentRef.tag]);
        targetLinksDict.set(destName, destValue);
      });
    }
  }

  const destinationDestsRef = target.context.register(targetLinksDict);
  target.catalog.set(LINKS_PDF_NAME, destinationDestsRef);
}

function mapSourceToTargetPages(
  source: PDFDocument,
  target: PDFDocument,
  startingTargetPage: number,
): { mapping: Record<string, PDFRef>; targetPage: number } {
  const result: Record<string, PDFRef> = {};
  const targetPages = target.getPages();
  let currentTargetPage = startingTargetPage;
  const sourcePages = source.getPages();

  for (let i = 0; i < sourcePages.length; i++) {
    result[sourcePages[i].ref.tag] = targetPages[currentTargetPage].ref;
    currentTargetPage++;
  }

  return { mapping: result, targetPage: currentTargetPage };
}

@FiveOFive nice fix. we don't have such case, so i didn't encounter the issue.