This project allows to quickly create hand-crafted PDF files.
The main Python script pdf-corpus.py
is an ad-hoc template engine to easily prototype new PDFs.
To compile the corpus, just make
it (you need a Python interpreter).
All .txt
files contained in the corpus/
folder are then converted into PDFs.
Each PDF in the corpus is described by a .txt
file that indicates the template to use and the content to insert in the template.
The following templates are defined, but you can easily create your own by tweaking the Python code.
contentstream
: A simple document containing one page in A4 format. You define the graphic commands to put in the page's content stream (see my cheat sheet). For convenience, a font resource is declared as/F1
.objects
: A lower level template to directly declare objects. Simple streams can be defined, for which the template computes the/Length
field.
The corpus already contains some files. These examples are classified into the following categories.
corpus/contentstream/
: Playing with graphics instructions.corpus/name/
: Escape sequences in names.corpus/number/
: How numbers are parsed.
If you want to learn more about how these examples work, you can have a look at my blog posts: introduction to PDF syntax. I also make one-page cheat sheet(s) about PDF. For further details you can also dive into the PDF specification.
Once compiled, these example files may not be fully compliant with the specification. In particular, they may be interpreted differently by different PDF readers.
MIT