A command line utility to automate the process of compiling a LaTeX project to a PDF complaint with the PDF/A standard.
Assuming you are using a Debian/Ubuntu machine:
- Python3
- Usually pre-installed
- TeX Live
sudo apt install texlive-latex-base texlive-fonts-recommended texlive-latex-extra texlive-bibtex-extra
- ExifTool
sudo apt install exiftool
pip install latex2pdfa
Run the following in your terminal and follow the instructions:
latex2pdfa path/to/your/main_tex_file.tex
By default, the generated PDF will comply with the 1b
standard which most universities require.
You can specify an output filename with --output-filename
, otherwise the generated PDF will have the same name of your
main_tex_file
followed by -PDFA-1b
.
You can get the exhaustive list of arguments by running:
latex2pdfa --help
usage: latex2pdfa.py [-h] [--version] [-cl CONFORMANCE_LEVEL] [-clv CONFORMANCE_LEVEL_VERSION] [-o OUTPUT_DIR] [-of OUTPUT_FILENAME] [-i]
[-v] [-nc] [-ve] [--pdflatex-path PDFLATEX_PATH] [--pdflatex_extra_cmds PDFLATEX_EXTRA_CMDS] [--bibtex-path BIBTEX_PATH]
[--gs-path GS_PATH] [--verapdf-path VERAPDF_PATH]
tex_file
positional arguments:
tex_file The main tex file of your LaTex project
options:
-h, --help show this help message and exit
--version show program's version number and exit
-cl CONFORMANCE_LEVEL, --conformance-level CONFORMANCE_LEVEL
The PDF/A standard conformance level (`a`, `b`, or `u`), default to `b`
-clv CONFORMANCE_LEVEL_VERSION, --conformance-level-version CONFORMANCE_LEVEL_VERSION
The PDF/A standard conformance level version (`1`, `2`, or `3`), default to `1`
-o OUTPUT_DIR, --output-dir OUTPUT_DIR
The directory where the generated PDF will be stored, default to the project directory
-of OUTPUT_FILENAME, --output-filename OUTPUT_FILENAME
The filename of the generated PDF, default to the main LaTex filename with the suffix PDFA-`cl`clv` (for ex: thesis-
PDFA-1b.pdf
-i, --ignore-metadata
Ignore adding the metadata file to the project folder in case it is already done manually, default to false
-v, --verbose show all under the hood commands and their output
-nc, --no-clean Keep the temporary files generated from the compilation
-ve, --verify Verify the generated PDF using veraPDF (veraPDF path must be provided in this case)
--pdflatex-path PDFLATEX_PATH
pdflatex executable path, if it is not specified, the script will search on your environment variable PATH
--pdflatex_extra_cmds PDFLATEX_EXTRA_CMDS
Add any extra commands to pdflatex (use quotation marks)
--bibtex-path BIBTEX_PATH
bibtex executable path, if it is not specified, the script will search on your environment variable PATH
--gs-path GS_PATH ghostscript executable path, if it is not specified, the script will consider the one inside the binaries folder
--verapdf-path VERAPDF_PATH
veraPDF executable path, if it is not specified, the script will consider the one inside the binaries folder
This is quoted from the pdf2archive repository.
(I can't say it better 😂)
This script was born as a necessity, when I had to convert the LaTeX-produced PDF of my MSc Thesis into a PDF/A-1B.Once upon a time, the delivery of the Thesis had to be done manually, by burning a CD-ROM with the Thesis PDF on it. I don't need to say that it was extremely old-fasioned and inefficient, as you had to deliver the CD-ROM to the secretariat in person. Finally, in 2015, my university decided to activate the online submission of the PDF: you just had to upload your PDF and you were done, completely hassle-free.
Then one year ago, some enlightened mind in whoever knows what administrative office, decided that a regular PDF was not easy enough; so, the university began to require the much more satanic PDF/A-1B. Of course, they had to provide a set of instructions for us mere mortal, so that we could produce valid PDF/A-1B files; and indeed they did, by uploading a fantastic document. If you took the (click)bait and read the PDF (not PDF/A-1B, eh!) instructions at the previous linked page, you might have noticed the absolute completeness of the information contained in it: there are instructions to transform a PDF into a PDF/A-1B by either using a Windows-only free program (yeah, I know) or an obsolete OpenOffice plugin that doesn't work anymore or paid, commercial programs that work at most only on Windows and MacOS. No free, cross-platform alternative because hey, everyone loves Windows! Naturally, you can directly produce a PDF/A-1B version of your Thesis. The document lists some easy instructions to perform a direct export into a PDF/A-1B from either Microsoft Word (or Excel, because there are people who of course write their thesis in Excel) or OpenOffice. Because everyone on Earth, especially people who do Physics or Maths, write their thesis in Microsoft Word... they look sooo beautiful, in particular when you have to put footnotes, citations, table of contents, when Word spreads the text in a page in a zebra-style, and when you write those amazing equations in Comic Sans that get rendered as 10 DPI jpeg's. "And people who use LaTeX"? "Latex? What latex? I don't do that kind of dirty sex stuff"! - would say the guy who wrote that document.
So you could imagine me and my friends, on the last available day for the Thesis delivery, still struggling trying to figure out how to convert. There is a nice site that converts PDF's into PDF/A-1B files, but there are some points:
- your Thesis gets filled with metadata from that site, which is not nice for an official document
- the file size limit is 10 Mb, so if you do a more experimental Thesis which is full of images you're out
- this solution depends on someone else resources; if the site goes down tomorrow, you're in deep s***
- it only works online, no offline alternative if you're on the move
- you have to send personal data to an unknown site
- you don't know what operations are being performed on your file and your data on the other side of the line
By digging around on Google, you can find people saying that you can perform the conversion via Ghostscript by just turning on a couple of switches; unfortunately, this doesn't work (the online system, Esse3, keeps saying that the file is not valid) and the matter is slightly more complicated and poorly documented. The failure in producing a valid PDF/A-1B is connected to the complex set of requirements needed, especially font embedding, metadata and color space. This script is just a collection of all the things one should to in order to obtain (in most of the cases) a valid PDF/A-1B document [...].
- The use of the
pdfx
package alone still produce validation errors!! - The use of
Ghostscript
alone to convert the PDF to PDF/A is not always successful. Sometimes the old versions does not work. Sometimes, the recent versions does not have the same arguments because it is always evolving, and even if it works, you may find that the links are not working, or the table of contents does not exist, etc. After a lot (I mean a lot) of trial and error, I found that the version9.23
is giving the best results, I decided to include it with the project files. - The script uses both to produce a high quality PDF/A directly from the LaTex source files.
- The script is only compatible with the
b
conformance level. Unfortunately, there is noway to generate a fully compatiblePDF/A-a
from LaTex until now (as of my knowledge). - More interesting information are available in the FAQs section of pdf2archive.
GPLv3 © latex2pdfa. For more information see LICENSE.md
.