Analyzes a PDF file with change bars, then adds clickable links to step through the changed pages
In our technical publications flow, we save PDF files with change bars, then give those files to reviewers to comment on the changes. However, for large (1000+ page) documents, it was painful for the reviewers to search for the next set of change bars.
This perl script analyzes a PDF file for change bars, then adds navigation links along the bottom to jump to the next page with change bars. As you click anywhere within the bottom bar annotation, the viewer will jump to the next change-bar page. The horizontal position of the text box is proportional to your progress through the changed pages.
This perl script runs in linux. If you're running Windows 10, it also runs on Windows Subsystem for Linux (WSL).
You'll need the following packages:
sudo apt update
sudo apt install ghostscript imagemagick poppler-utils
Your PDFs must have a solid background color where the change bars are.
The utility works as follows:
- Renders a multipage low-res TIFF file from the PDF. (ghostscript)
- Crops the TIFF images to a bounding box where the change bar exists. (pdfinfo, ImageMagick)
- Performs a "background removal" operation to shrink each page image to just the change bar (if it exists) or to zero size (if none exists). (ImageMagick)
- Uses the image sizes to determine which pages had change bars. (annotate_change_bars.pl)
- Processes the PDF file to add clickable navigation links at the bottom. (ghostscript)
First, edit the script to describe the bounding box where the change bars can exist:
# define the change bar bounding box, in inches, from the upper-left corner
my $x1 = 0.500;
my $x2 = 0.875;
my $y1 = 1.0;
my $y2 = 10.0;
Any content in this bounding box - even headers or footers - will be treated as a change bar.
Next, run the utility on your PDF file as follows:
annotate_change_bars.pl my_file.pdf [-o new_file.pdf]
If you do not specify the -o option, the file is modified in-place with the annotated file.
The script produces output as follows:
Getting change bar information from PDF...
Creating multipage TIFF image file for change bars...
Getting change bar heights...
Total changed pages detected: 5
3 6 7 8 10
Total change sections: 3
3
6 7 8
10
Creating annotated PDF file 'test_orig.pdf'...
If there is no change on the first page, a marker is added to take you to the first change.
The change information text box at the bottom turns gray when the next page with a change is also the next page in the document (to let you know that the content to review is contiguous.)
On the page with the last change, the text box is colored green. Clicking on it will return you to the first page with a change (not the first page in the document) if you want to re-review the changes.
Most in-browser PDF viewers do not render text in the navigation bar at the bottom. (Specifically, they do not render /FreeText pdfmark annotations.) However, the link is still clickable and navigation still works. Standalone PDF viewing programs generally work fine.
