Links are lost after combining PDFs
vekunz opened this issue · 8 comments
Hi, I use pdf-lib to combine multiple PDFs. One of the PDFs has links in it, like a table of contents. The links direct to other pages of the same PDF. the problem is that these links are lost after combining PDFs with pdf-lib.
Is there a way to preserve the links?
My code:
const pdfDoc = await PDFDocument.create();
for (const file of files) {
const indices = [];
for (let i = 0; i < file.getPageCount(); i++)
indices.push(i);
const pages = await pdfDoc.copyPages(file, indices);
for (const page of pages) {
pdfDoc.addPage(page);
}
}
Edit: I found out that the links are saved as "Named Destinations" in the PDF. The PDF has Version 1.4. One option would be that I add the destinations after merging, but then I need an option to add these to the pdf manually.
Hello @vekunz!
As you noted, the links do not work after the pages are merged because the links reference Named Destinations. Named Destinations are stored under the /Dests
entry of the document's catalog. Unfortunately, the current page copying code does not copy anything from the donor document that isn't accessible from the page via a chain of indirect references. And most of the resources listed under the catalog are not accessible in this way.
This limitation has come up before in #159 and #218. I would like to see this issue resolved, but haven't had any time to work on it. I'd be open to discussing a solution to anybody interested in implementing a fix for copying catalog entries between documents!
Hi @Hopding,
Thanks for your great work!
I'd like to support you in copying catalog entries between documents. I'm new to PDFs internal workings but am a quick learner. I started researching the format and feel like I've got a good overview.
Since you know about pdf-lib best, do you have any suggestions for implementing this feature? My first (uneducated) guess would be:
- Find the catalog entries in the original document.
- Copy all catalog entries related to links to the new document.
Wonderful lib! Know this old issue and closed, but links still do not work on merge in latest release. Hoping for support in future.
any updates for internal link to work?
This is how i post process multiple documents after using copy pages.
import { PDFArray, PDFDict, PDFDocument, PDFName, PDFRef } from 'pdf-lib';
function getLinksPDFName(): PDFName {
return PDFName.of('Dests');
}
function mapSourceToTargetPages(
sources: PDFDocument[],
destination: PDFDocument,
): Record<string, PDFRef> {
const result = {};
const sourcePages = sources.flatMap(source => source.getPages());
const destinationPages = destination.getPages();
for (let i = 0; i < sourcePages.length; i++) {
result[sourcePages[i].ref.tag] = destinationPages[i].ref;
}
return result;
}
export function copyLinks(sources: PDFDocument[], target: PDFDocument): void {
const targetLinksDict = PDFDict.withContext(target.context);
sources
.map(source => source.context.lookupMaybe(source.catalog.get(getLinksPDFName()), PDFDict))
.filter(links => links != null)
.forEach(links =>
links.entries().forEach(([destName, destValue]) => targetLinksDict.set(destName, destValue)),
);
const pagesMapping = mapSourceToTargetPages(sources, target);
(targetLinksDict.values() as PDFArray[]).forEach(array => {
const currentPageRef = array.get(0) as PDFRef;
array.set(0, pagesMapping[currentPageRef.tag]);
});
const destinationDestsRef = target.context.register(targetLinksDict);
target.catalog.set(getLinksPDFName(), destinationDestsRef);
}
How it works:
- copy entries from
Dests
from all sources to new dictionary - fix references (links) in target
Dests
because copied pages have different PDFRef - register dictionary in the target's context
- set
Dests
reference in target's catalog to dictionary
Thanks for sharing @Ludevik. This has been very helpful. One issue I did encounter is that my sources seem to have duplicate ref.tag
values between them. Creating a single mapping of all the source documents to the target was overwriting the duplicates, so I refactored to only map one source document at a time.
export function copyLinks(sources: PDFDocument[], target: PDFDocument) {
const targetLinksDict = PDFDict.withContext(target.context);
let currentTargetPage = 0;
for (const source of sources) {
const { mapping, targetPage } = mapSourceToTargetPages(source, target, currentTargetPage);
currentTargetPage = targetPage;
const links = source.context.lookupMaybe(source.catalog.get(LINKS_PDF_NAME), PDFDict);
if (links !== null) {
links?.entries().forEach(([destName, destValue]) => {
const currentRef = (destValue as PDFArray).get(0) as PDFRef;
(destValue as PDFArray).set(0, mapping[currentRef.tag]);
targetLinksDict.set(destName, destValue);
});
}
}
const destinationDestsRef = target.context.register(targetLinksDict);
target.catalog.set(LINKS_PDF_NAME, destinationDestsRef);
}
function mapSourceToTargetPages(
source: PDFDocument,
target: PDFDocument,
startingTargetPage: number,
): { mapping: Record<string, PDFRef>; targetPage: number } {
const result: Record<string, PDFRef> = {};
const targetPages = target.getPages();
let currentTargetPage = startingTargetPage;
const sourcePages = source.getPages();
for (let i = 0; i < sourcePages.length; i++) {
result[sourcePages[i].ref.tag] = targetPages[currentTargetPage].ref;
currentTargetPage++;
}
return { mapping: result, targetPage: currentTargetPage };
}
@FiveOFive nice fix. we don't have such case, so i didn't encounter the issue.