What is needed for creating PDF/A-3b
Opened this issue · 1 comments
In reference to https://forum.pdfsharp.net/viewtopic.php?f=4&t=3031
I am evaluating the needed changes to PdfSharp, to create documents compliant to PDF/A-3b.
For now I stumble through various specifications with mostly trial&error.
My usecase:
- create documents from scratch with all the methods around
XGraphics
. - attach files compliant with ZUGFeRD invoices. (https://ferd-net.de/)
What I don't want:
- reading, modifying, writing exisiting documents
Here is, what I have discovered by now, actually the code changes are not in a good shape, because frist I want to create a valid document somehow, and not in a robust way...
Think of it as some random notes, so they want get lost.
PDF Version
Must be 1.7 (as far as I know)
XMP Metadata changes
Replace old metadata in PdfMetadata.cs
with this:
"<x:xmpmeta xmlns:x=\"adobe:ns:meta/\">\n" +
" <rdf:RDF xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\">\n" +
" <rdf:Description rdf:about=\"\"\n" +
" xmlns:pdfaid=\"http://www.aiim.org/pdfa/ns/id/\"\n" +
" xmlns:pdf=\"http://ns.adobe.com/pdf/1.3/\"\n" +
" xmlns:dc=\"http://purl.org/dc/elements/1.1/\"\n" +
" xmlns:xmp=\"http://ns.adobe.com/xap/1.0/\"\n" +
" xmlns:xmpMM=\"http://ns.adobe.com/xap/1.0/mm/\">\n" +
" <pdfaid:part>3</pdfaid:part>\n" +
" <pdfaid:conformance>B</pdfaid:conformance>\n" +
" <xmpMM:InstanceID>uuid:" + instanceId + "</xmpMM:InstanceID>\n" +
" <xmpMM:DocumentID>uuid:" + documentId + "</xmpMM:DocumentID>\n" +
" <xmp:CreateDate>" + creationDate + "</xmp:CreateDate>\n" +
" <xmp:ModifyDate>" + modificationDate + "</xmp:ModifyDate>\n" +
" <xmp:MetadataDate>" + modificationDate + "</xmp:MetadataDate>\n" +
" <xmp:CreatorTool>" + creator + "</xmp:CreatorTool>\n" +
" <pdf:Producer>" + producer + "</pdf:Producer>\n" +
" <dc:creator>\n" +
" <rdf:Seq>\n" +
" <rdf:li></rdf:li>\n" +
" </rdf:Seq>\n" +
" </dc:creator>\n" +
" <dc:title>\n" +
" <rdf:Alt>\n" +
" <rdf:li xml:lang=\"x-default\">" + title + "</rdf:li>\n" +
" </rdf:Alt>\n" +
" </dc:title>\n" +
" <dc:description>\n" +
" <rdf:Alt>\n" +
" <rdf:li xml:lang=\"x-default\"></rdf:li>\n" +
" </rdf:Alt>\n" +
" </dc:description>\n" +
" </rdf:Description>\n" +
" </rdf:RDF>\n" +
"</x:xmpmeta>\n" +
No interactive elements allowed
So I think, everything around AcroForms must not be used. Since I only create "printable" documents without any of this functionality, I just ignore them. For a "real" solution, PdfSharp should throw an error, if you try to use something not valid to PDF/A.
Catalog improvements
The catalog must include an /OutputIntents
array with at least one /Type /OutputIntent /S /GTS_PDFA1 /OutputConditionIdentifier (sRGB2014) /DestOutputProfile ...
where the DestOutputProfile
is an embeded ICC color profile.
I use the "sRGB2014.icc" from http://www.color.org/srgbprofiles.xalter
The ICC must be a stream object with /N 3 /Alternate /DeviceRGB /Filter /FlateDecode /Length ...
.
Link Annotation
Weblinks must have a key /F 4
, I don't really know, why. It seems some "printable" link annotation.
With these changes the various validators, e.g. veraPDF, declare my generated document as "compliant".
For ZUGFeRD there are some changes needed on attachments like /Relationship
and an /AF
array (associated files) and other things. That's next on my list...
Here some code dumps, if anyone is interested. It isn't clean, it's not beautiful, but it works...
Additions to PdfCatalog.cs
private static readonly SemaphoreSlim _iccLock = new SemaphoreSlim(1, 1);
private static byte[] _compressedIccBytes;
private static byte[] CompressedIccBytes
{
get
{
if (_compressedIccBytes == null)
{
_iccLock.Wait();
try
{
var iccStream = System.Reflection.Assembly.GetExecutingAssembly().GetManifestResourceStream("PdfSharp.sRGB2014.icc");
var iccBytes = new byte[iccStream.Length];
iccStream.Read(iccBytes, 0, iccBytes.Length);
_compressedIccBytes = new Filters.FlateDecode().Encode(iccBytes);
}
finally
{
_iccLock.Release();
}
}
return _compressedIccBytes;
}
}
private PdfDictionary IccProfile
{
get
{
if (_iccProfile == null)
{
_iccProfile = new PdfDictionary(Owner);
_iccProfile.Elements.SetInteger("/N", 3);
_iccProfile.Elements.SetName("/Alternate", "/DeviceRGB");
_iccProfile.Elements.SetName("/Filter", "/FlateDecode");
var stream = _iccProfile.CreateStream(CompressedIccBytes);
_iccProfile.Elements.SetInteger(PdfStream.Keys.Length, stream.Length);
Owner.Internals.AddObject(_iccProfile);
}
return _iccProfile;
}
}
PdfDictionary _iccProfile;
private PdfDictionary OutputIntent
{
get
{
if (_outputIntent == null)
{
_outputIntent = new PdfDictionary(Owner);
_outputIntent.Elements.SetName(Keys.Type, "/OutputIntent");
_outputIntent.Elements.SetName("/S", "/GTS_PDFA1");
_outputIntent.Elements.SetString("/OutputConditionIdentifier", "sRGB2014");
_outputIntent.Elements.SetReference("/DestOutputProfile", IccProfile.Reference);
Owner.Internals.AddObject(_outputIntent);
}
return _outputIntent;
}
}
PdfDictionary _outputIntent;
public PdfArray OutputIntents
{
get
{
if (_outputIntents == null)
{
_outputIntents = new PdfArray(Owner);
_outputIntents.Elements.Add(OutputIntent);
Owner.Internals.AddObject(_outputIntents);
Elements.SetReference(Keys.OutputIntents, _outputIntents.Reference);
}
return _outputIntents;
}
}
PdfArray _outputIntents;
...
internal override void PrepareForSave()
{
...
IccProfile.PrepareForSave();
OutputIntent.PrepareForSave();
OutputIntents.PrepareForSave();
PdfLinkAnnotation.cs
case LinkType.Web:
//pdf.AppendFormat("/A<</S/URI/URI{0}>>\n", PdfEncoders.EncodeAsLiteral(url));
Elements.SetInteger(PdfAnnotation.Keys.F, 4);
If I will come to a point where I can submit a pull request, I will do...
In addition, I am getting "Spec. ISO_19005_3 clause 6.2.11.3 test 2" ISO 32000-1:2008, 9.7.4, Table 117 requires that all embedded Type 2 CIDFonts in the CIDFont dictionary shall contain a CIDToGIDMap entry that shall be a stream mapping from CIDs to glyph indices or the name Identity, as described in ISO 32000-1:2008, 9.7.4, Table 117.
Would be a very easy fix, need to add descendantFontDictionary.Elements.SetName("/CIDToGIDMap", "/Identity"), which is a default anyway.