LibrePDF/OpenPDF

PDF/A support

Closed this issue · 7 comments

A new PdfADocument and PdfAWriter for generating PDF/A compliant documents seems useful.

Suggest to create unit tests using veraPDF to validate PDF files created by OpenPDF.

https://github.com/veraPDF/veraPDF-validation

https://en.wikipedia.org/wiki/PDF/A

I have created a quick initial test case which validates PDF files generated by OpenPDF here:
https://github.com/LibrePDF/OpenPDF/blob/master/openpdf/src/test/java/com/lowagie/text/validation/PDFValidationTest.java

Currently it finds 5 validation errors:

Validation errors: 5
TestAssertion [ruleId=RuleId [specification=ISO 19005-1:2005, clause=6.8.2.2, testNumber=1], status=failed, message=The document catalog dictionary shall include a MarkInfo dictionary with a Marked entry in it, whose value shall be true., location=Location [level=CosDocument, context=root], locationContext=null, errorMessage=null]
TestAssertion [ruleId=RuleId [specification=ISO 19005-1:2005, clause=6.7.3, testNumber=7], status=failed, message=The value of Producer entry from the document information dictionary, if present, and its analogous XMP property pdf:Producer shall be equivalent., location=Location [level=CosDocument, context=root/trailer[0]/Info[0]], locationContext=null, errorMessage=null]
TestAssertion [ruleId=RuleId [specification=ISO 19005-1:2005, clause=6.8.3.3, testNumber=1], status=failed, message=The logical structure of the conforming file shall be described by a structure hierarchy rooted in the StructTreeRoot entry of the document catalog dictionary, as described in PDF Reference 9.6, location=Location [level=CosDocument, context=root/document[0]], locationContext=null, errorMessage=null]
TestAssertion [ruleId=RuleId [specification=ISO 19005-1:2005, clause=6.7.2, testNumber=1], status=failed, message=The document catalog dictionary of a conforming file shall contain the Metadata key., location=Location [level=CosDocument, context=root/document[0]], locationContext=null, errorMessage=null]
TestAssertion [ruleId=RuleId [specification=ISO 19005-1:2005, clause=6.5.3, testNumber=2], status=failed, message=An annotation dictionary shall contain the F key. The F key’s Print flag bit shall be set to 1 and its Hidden, Invisible and NoView flag bits shall be set to 0, location=Location [level=CosDocument, context=root/document[0]/pages[0](4 0 obj PDPage)/annots[0](1 0 obj PDAnnot)], locationContext=null, errorMessage=null]

@Lonzak @bsanchezb @netmackan Are you able to assist here please?

Adding @mkl-public to the loop as well

Lonzak commented

Can you use PDFAFlavour.PDFA_1_B; and try again?

PDFAFlavour.PDFA_1_B returns these 3 Validation errors:

TestAssertion [ruleId=RuleId [specification=ISO 19005-1:2005, clause=6.7.3, testNumber=7], status=failed, message=The value of Producer entry from the document information dictionary, if present, and its analogous XMP property pdf:Producer shall be equivalent., location=Location [level=CosDocument, context=root/trailer[0]/Info[0]], locationContext=null, errorMessage=null]
TestAssertion [ruleId=RuleId [specification=ISO 19005-1:2005, clause=6.7.2, testNumber=1], status=failed, message=The document catalog dictionary of a conforming file shall contain the Metadata key., location=Location [level=CosDocument, context=root/document[0]], locationContext=null, errorMessage=null]
TestAssertion [ruleId=RuleId [specification=ISO 19005-1:2005, clause=6.5.3, testNumber=2], status=failed, message=An annotation dictionary shall contain the F key. The F key’s Print flag bit shall be set to 1 and its Hidden, Invisible and NoView flag bits shall be set to 0, location=Location [level=CosDocument, context=root/document[0]/pages[0](4 0 obj PDPage)/annots[0](1 0 obj PDAnnot)], locationContext=null, errorMessage=null]

So it would be interesting to find solutions to these validation errors.

Lonzak commented

PDF/A-1a is for tagged PDFs which needs special handling. So for this test I think PDF/A-1b is (more) correct.

I tested to fix the first message (which is relatively easy) however then I asked myself what the goal is.
I think in a later *text version there is a PDFA-Generator which specifically aims to generate a PDF/A compliant PDF. So maybe we should think about a new PdfADocument class or to specify the type of PDF in the constructor of the Document class. But then it will be much more difficult (=effort) to always generate correct PDF/A documents...

A new PdfADocument and PdfAWriter for generating PDF/A compliant documents seems useful.