robamler/dontprint

With Zotero, use metadata in PDF

Closed this issue · 3 comments

Currently, the PDF created by Dontprint uses the metadata (title, author…) from the original PDF. At least on my e-reader, this medata (rather than, say, the filename) is used in the listing, sometimes resulting in useless names such as "untitled".

It would be great, when the PDF is created through Dontprint's button in Zotero, if the information about author and title could be got from the Zotero library.

(On Linux, the command line tool exiftool allows to edit PDF metadata. I don't know how something similar could be implemented in a cross platform way without adding too many dependencies − yet without reinventing the wheel and having to deal with every corner case of PDF metadata).

(Oh, and btw, this should be labelled as "enhancement". On your web site you ask the user to set the label, but I think only the administrator can do this − which makes sense.)

The best way to do this, in my opinion, would be to have these meta data fields set by k2pdfopt. Currently, k2pdfopt doesn't support any command line options to set the title and author string, but I'll ask William Menninger if he'd be willing to add those. (Alternatively, it doesn't seem to be too difficult to add some meta data fields from JavaScript code, by making an incremental update to the generated PDF file. But it seems more reasonable to me to get this feature upstream into k2pdfopt.)

Some notes (mostly for myself):

  • PDF spec is here. Incremental updates are described in section 7.5.6. Meta data is described in section 14.3.3.
  • There's an alternative XML-based format to include meta data in PDF files called XMP. Not sure if there are any e-readers out there which support only one but not the other format.
  • K2pdfopt already writes out title and author information to the generated PDF if it finds it in the original PDF file. The /Author field of the document information dictionary is set in this line in pdfwrite.c. It is only set if the string author is not empty. That string is read out from the original PDF file here. It is set to an empty string if no author string is found in the original document. The title field is is treated similarly, only that it falls back to the filename of the generated document if no title is specified for the original document. For Dontprint, it would be good if there was a command-line option to either overwrite the original title and author information or to supply "default" values in case they're not included in the original document (not sure which option would be better, probably the first one).

This issue is fixed in the recently released port of Dontprint to a Google Chrome extension (version 1.1 beta). I.e., the Chrome extension sets the "author" and "title" meta data of the optimized PDF according to the author and title that the (builtin) Zotero translator detects on the website.

No progress so far on this issue for the Firefox extension. The difference between the Chrome and the Firefox extension is due to different ways in which the tool "k2pdfopt" is called on the two platforms. The Chrome extension runs k2pdfopt as a so-called portable native client module, while the Firefox add-on uses a native binary of k2pdfopt that is specific to the user's OS and computer architecture. The portable native client module contains code to set the author and title meta data to user-defined values. I'm a bit reluctant to include this code also in the native binaries since this would mean that I had to recompile k2pdfopt for all 6 supported combinations of OSes and architectures and I don't have the resources to test it on all of these. It's probably better to implement this upstream.

This has been implemented in commit 27fc5f0 (Dontprint version 1.1.2). I just forgot to close this issue.

Not that, on Firefox, you need at least version 2.33a of k2pdfopt for this to work. Go to the Dontprint settings (Firefox menu --> Dontprint --> Configure Dontprint), go to the "Advanced" tab, and then click on the button labelled "Check for updates".