A convinient way to create / layout / maintain PDF link annotations in Adobe Illustrator.
While you can create automatic links in a PDF (by put the actual url into a textbox), it is limiting in various ways (create links on graphics, create custom hotspot). Also, when iterating on a document design, I found it pretty cumbersome to create / update link annotations in external apps, so after some research on Apple PDFKit I put together this tiny tool.
Create a layer for the links (that you can hide later on).
Create a text starting with "Link " followed by the actual url.
Wrap into a clipping rectangle to define link hotspot.
Hide layer containing the links before export PDF.
Launch PDF Links, drag PDF into.
Enjoy linked PDF.
An installer is packaged at PDF_Links.dmg
.
Besides the use case, this repository is a prototype for PDF content processing in Swift.
The pages, annotations, textual content is pretty accessable with the high-level PDFKit.PDFDocument
APIs. However, the actual content streams in a PDF (images / graphics) are only accessible as raw data via PDFKit.CGPDFDocument
.
The project contains a Parser.swift
class that crawls a PDF object hierarchy and maps out the content as a JSON
for further inspection. Using that JSON you can plan out various processing implementations (images / fonts / graphics / layers / metadata / etc.).
// Parse PDF into JSON.
PDFParser.parse(pdfUrl: pdfFileURL, into: jsonFileURL)
// Parse PDF into Dictionary.
let pdf: [String:Any?] = PDFParser.parse(pdfUrl: pdfFileURL)
The resulting JSON gives you the entire PDF content (with type information in angle brackets).
{
"Catalog" : {
"Pages<Dictionary>" : {
"MediaBox<Array>" : [
0,
0,
612,
792
],
"Type<Name>" : "Pages",
"Kids<Array>" : [
{
"Rotate<Integer>" : 0,
"MediaBox<Array>" : [
0,
0,
595.27499999999998,
841.88999999999999
],
"Parent<Dictionary>" : "<PARENT_NOT_SERIALIZED>",
"Resources<Dictionary>" : {
"ColorSpace<Dictionary>" : {
"Cs1<Array>" : [
"ICCBased",
{
"N<Integer>" : 3,
"Filter<Name>" : "FlateDecode",
"Alternate<Name>" : "DeviceRGB",
"Length<Integer>" : 2612
}
]
}
...
You can get the PDF content as a Swift dictionary as well (see console output below).
Optional(["Pages<Dictionary>": Optional({
"Count<Integer>" = 1;
"Kids<Array>" = (
{
"ArtBox<Array>" = (
"28.3465",
"325.193",
"393.389",
"813.543"
);
"Contents<Stream>" = {
Data = "q Q q 0 0 595.276 841.89 re W n 1 0 1 0 k /Gs1 gs 201.8862 420.9449 m 201.8862\n473.8269 244.7562 516.6959 297.6372 516.6959 c 350.5192 516.6959 393.3892\n473.8269 393.3892 420.9449 c 393.3892 368.0629 350.5192 325.1939 297.6372\n325.1939 c 244.7562 325.1939 201.8862 368.0629 201.8862 420.9449 c f Q q 28.346 530.078 283.464 283.465\nre W n 0 0 0 1 k /Gs1 gs BT 12 0 0 12 28.3467 803.499 Tm /Tc1 1 Tf [ (h) 4\n(ttp://epp) 7 (z.eu) ] TJ ET Q";
"Filter<Name>" = FlateDecode;
"Length<Integer>" = 237;
};
"MediaBox<Array>" = (
0,
0,
"595.2760000000001",
"841.89"
);
"Parent<Dictionary>" = "<PARENT_NOT_SERIALIZED>";
"Resources<Dictionary>" = {
"ExtGState<Dictionary>" = {
"Gs1<Dictionary>" = {
"OPM<Integer>" = 1;
"Type<Name>" = ExtGState;
};
};
...
See Parser.swift
for more.
Graphics data is serialized using COS (Carousel Object System). Although Carousel was only a code name for what later became Acrobat, the name is still used to refer to the way a PDF file is composed. From the documentation: "...the data in a content stream is interpreted as a sequence of operators and their operands, expressed as basic data objects according to standard PDF syntax...". See official PDF Reference for more.
Here is what a slice of the contents of the example PDF used in Usage section looks like.
...
/OC /MC1 BDC
0.02 0.655 0.502 rg
0 586.77 595.275 255.119 re
f
EMC
/OC /MC2 BDC
BT
1 1 1 rg
/TT0 1 Tf
32.0407 0 0 32.0407 255.1182 714.3296 Tm
(Geri Borb\\341s)Tj
/TT1 1 Tf
14 0 0 14 255.1182 670.0706 Tm
[(I lo)19.1 (v)17.9 (e this industry)48 (. In the past 8 )28 (y)18 (ears I made )]TJ
0 -1.286 Td
[(numer)26 (ous )31 (Apps and Games )]TJ
/TT0 1 Tf
[(fr)26 (om z)14.1 (er)26 (o t)13 (o )]TJ
0 -1.286 Td
[(mark)27 (e)4 (t)]TJ
/TT1 1 Tf
[(, bo)7.1 (th t)13 (eamed and solo.)]TJ
ET
...
It is somewhat human readable, seemingly designed to direct draw using the operators. In this project I used the regex below to parse link text data with the bounds of the corresponding clipping rectangles. See the expression on Regex101 for more.
# Clipping Rectangle (x, y, width, height)
(?<x>\b[-0-9.]+\b)\s
(?<y>\b[-0-9.]+\b)\s
(?<width>\b[-0-9.]+\b)\s
(?<height>\b[-0-9.]+\b)\s
re\nW
# Spacing
(?:
. # Any character
(?! # Except followed by
# Clipping Rectangle
(\b[-\d.]+\b\s){4}
re\nW
)
)*? # 0 or more times
# URL
BT
# Spacing
(?:
. # Any character
(?!ET) # Except followed by 'ET'
)*? # 0 or more times
\n
# Link
(?<URL>
.[^\n]*? # Any character except new-line 0 or more times
Link # Containing 'Link'
.*? # Any character 0 or more times
)
# Followed by 'TJ' or 'Tj' at the end of the line
(?:TJ\n|Tj\n)
ET
It parses the graphic content into nicely usable Swift Codable structs. See PageLinks.parseLinks(from contents:)
for more. After parsing it can be encoded into JSON easily.
{
"pages" : [
{
"links" : [
{
"bounds" : {
"y" : 43.936999999999998,
"x" : 43.936999999999998,
"width" : 39.685000000000002,
"height" : 39.686
},
"urlString" : "http:\/\/bit.ly\/GeriBorbasLinkedIn"
},
{
"bounds" : {
"y" : 43.936999999999998,
"x" : 86.456999999999994,
"width" : 39.685000000000002,
"height" : 39.686
},
"urlString" : "http:\/\/bit.ly\/GeriBorbasTwitter"
},
{
"bounds" : {
"y" : 43.936999999999998,
"x" : 128.976,
"width" : 39.685000000000002,
"height" : 39.686
},
"urlString" : "http:\/\/bit.ly\/GeriBorbasGitHub"
},
...
To create PDFKit.PDFAnnotation
, the same coordinate system can be used. Having that, a parsed Link
can be directly converted into a PDFKit.PDFAnnotation
. Those can be added to a PDF page easily with PDFKit.PDFPage.addAnnotation(_:)
.
extension Link
{
var annotation: PDFAnnotation
{
PDFAnnotation(
bounds: CGRect(x: bounds.x, y: bounds.y, width: bounds.width, height: bounds.height),
forType: PDFAnnotationSubtype.link,
withProperties: nil
).with(url: url)
}
}
Licensed under the MIT License.