/pdf-annotate

Pure-python library for adding annotations to PDFs

Primary LanguagePythonMIT LicenseMIT

pdf-annotate

A pure-python library to add annotations to PDFs.

Build Status

About

pdf-annotate is a simple library to add PDF annotations to PDFs. Under the hood it uses the powerful and unopinionated pdfrw library to parse the PDF to figure out where to place the annotations.

Usage

from pdf_annotate import PdfAnnotator, Location, Appearance
a = PdfAnnotator('a.pdf')
a.add_annotation(
    'square',
    Location(x1=50, y1=50, x2=100, y2=100, page=0),
    Appearance(stroke_color=(1, 0, 0), stroke_width=5),
)
a.write('b.pdf')  # or use overwrite=True if you feel lucky

Annotation Types

pdf-annotate includes most of the basic PDF annotation types, leaving out some of the more complex interactive types. Contributions for these welcome! Currently supported annotation types are:

  • square
  • circle
  • line
  • polygon
  • polyline
  • ink
  • text
  • image

Appearance

Annotations' appearance is controlled by the Appearance class, passed to the appearance argument to add_annotation. Not all attributes on this class apply to all annotations; documentation on this is forthcoming.

Location

Where an annotation is placed on the PDF is controlled by the Location class, passed to the location argument to add_annotation. By default these coordinates are in the PDF's user space scale, which is "points". There are 72 points/inch, so an 8.5"x11" PDF would have a coordinate system of 612x792. See scaling and rotation below for changing the coordinate system.

Annotations that are defined by width/height (square, circle, text, image) require x1, y1, x2, y2 attributes, while annotations that are defined by a list of points (line, polygon, polyline, ink) require a points attribute. All annotations require a page attribute, which determines which page of the PDF the annotations will be placed on.

Metadata

PDF annotations can contain arbitrary metadata. This is controlled by the Metadata class, passed to the metadata argument to add_annotation. By default, annotations will contain default values for creation date, modification date, unique name (just a uuid), and the print flag set. To leave off any of these, use the UNSET singleton. For more context, check out the Metadata class itself.

Scaling and rotation

pdf-annotate draws annotations as though you were drawing them in a PDF viewer, meaning it assumes you want to draw on the rotated page. For example an annotation drawn at (10, 10) on a 90° rotated page will still appear in the bottom left, not the top-left.

It also supports specifying your annotations' coordinates in differently scaled coordinate systems. If, for example, you know your coordinates are in the system of the PDF rastered at 150 DPI, you would specify scale=72.0/150 in the constructor to properly scale your coordinates to PDF user space.

Finally, if all you have is the dimensions of each page in the viewer's coordinate system, you can specify these. Building on the previous example, if you know the dimensions of page 0, you would use

a = PdfAnnotator('a.pdf')
a.set_page_dimensions((1275, 1650), 0)

Note that these are the dimensions of an un-rotated 8.5"x11" page rastered at 150 DPI. If the same page is rotated 90° or 270°, you would pass in (1650, 1275). Setting page dimensions specifically overrides document-wide scale and rotation settings.

Advanced Usage

Using the Content Stream

pdf-annotate also includes an abstraction of the PDF content stream that you can use to draw arbitrary annotation shapes onto the PDF. To fully take advantage of this feature, we recommend reading the relevant parts of the PDF specification: (Section 8 - Graphics and Section 12.5.5 - Annotation Appearance Streams).

To use an explicit content stream in an annotation, specify the appearance_stream argument to the Appearance object as a pdf_annotate.graphics.ContentStream object. See the end-to-end tests for examples.

Local Development

Tests are run against several supported python versions using tox. To get this to work, you need versioned python executables - e.g. python3.6 - in your path.

An opinionated setup, which assumes you have certain python versions installed, and that you use pyenv, is provided by make setup. After this you can run tox to run tests.

Manual tests

Fully automated testing is difficult for things that depend on the complexities of PDF viewers. When making changes, it's good practice to compare the file tests/end_to_end/pdfs/end_to_end.pdf, which is generated during testing, with expected.pdf in the same directory. To ensure rotation is handled correctly, there is also end_to_end_rotated_90.pdf and corresponding expected file.

By default, the file will be the one generated during the last python version's tox run. To check a specific version, use e.g. tox -e py36.