Validation check issues
rusllonrails opened this issue ยท 15 comments
Hey Guys,
I'm very happy to use prawn ๐
One small thing I got today is that my generated pdf has some validation issues:
1.4.1 : Trailer Syntax error, The trailer dictionary doesn't contain ID
3.1.1 : Invalid Font definition, Some required fields are missing from the Font dictionary.
3.1.2 : Invalid Font definition, FontDescriptor is null or is a AFM Descriptor
7.1 : Error on MetaData, Missing Metadata Key in catalog
So I use latest version of prawn:
gem 'rails', '4.1.8'
gem 'prawn', git: "git@github.com:prawnpdf/prawn.git"
gem 'pdf_validator'
In rails console:
# I'm generating PDF file:
Prawn::Document.generate("metadata.pdf",
:info => {
:Title => "My title",
:Author => "John Doe",
:Subject => "My Subject",
:Keywords => "test metadata ruby pdf dry",
:Creator => "ACME Soft App",
:Producer => "Prawn",
:CreationDate => Time.now
}) do
text "This is a test of setting metadata properties via the info option."
text "While the keys are arbitrary, the above example sets common attributes."
end
# Then try to validate generated file with "pdf_validator" gem (https://github.com/bitzesty/pdf_validator):
> path_to_pdf = "#{Rails.root}/metadata.pdf"
> res = PdfValidator.validate(path_to_pdf)
> res[:errors].map { |e| puts e }
1.4.1 : Trailer Syntax error, The trailer dictionary doesn't contain ID
3.1.1 : Invalid Font definition, Some required fields are missing from the Font dictionary.
3.1.2 : Invalid Font definition, FontDescriptor is null or is a AFM Descriptor
7.1 : Error on MetaData, Missing Metadata Key in catalog
Then I also uploaded generated "metadata.pdf" file to http://www.pdf-tools.com/pdf/validate-pdfa-online.aspx
and got some issues in results:
Validating file "innovation_award_Dec_18_2014(1).pdf" for conformance level pdfa-1a
The file trailer dictionary must have an id key.
The key Metadata is required but missing.
The key MarkInfo is required but missing.
A device-specific color space (DeviceGray) without an appropriate output intent is used.
A device-specific color space (DeviceRGB) without an appropriate output intent is used.
The key F is required but missing. (2)
The value of the key SMask is an image but must be None. (2)
The value of the key CA is 0 but must be 1.0. (2)
The value of the key ca is 0 but must be 1.0. (2)
The font Helvetica-Bold must be embedded.
The font Helvetica-Oblique must be embedded.
The font Helvetica must be embedded.
The document does not conform to the requested standard.
The document contains device-specific color spaces.
The document contains fonts without embedded font programs or encoding information (CMAPs).
The document contains transparency.
The document contains hidden, invisible, non-viewable or non-printable annotations.
The document's meta data is either missing or inconsistent or corrupt.
The document doesn't provide appropriate logical structure information.
Done.
Maybe someone is experienced with same issue and know how to fix it.
Thanks for any help ๐ป
๐
Validation errors are specific to PDF/A profile. At the moment Prawn doesn't support PDF/A and I personally don't plant to work on it any time soon. I'll be happy to help anyone who decide to contribute PDF/A support.
Has anyone ever managed to generate a PDF/A-3
compliant PDF with Prawn?
I would be glad if someone could provide a gist or other resources on how to achieve this. Even my co-pilot has been biting his teeth out so far.
@timokleemann I'm not sure why you are asking this here because in https://github.com/orgs/prawnpdf/discussions/1231#discussioncomment-10982910 you mentioned that you have full ZUGFeRD compatibility which requires PDF/A-3.
@gettalong Well observed! But I am using GhostScript to convert the Prawn PDFs to PDF-A
standard. This is buggy, however, and I am not happy with it. I would love to create a PDF-A
from within Prawn. But I havenโt come across anyone who has successfully done that.
@timokleemann Ah, okay. Alas, for Prawn itself I can offer you only some guidance. You would need to embed the required PDF/A XMP metadata stream, an ICC color profile (probably SRGB), make sure that you only use embedded fonts and a few other things which Prawn probably already takes care of. It shouldn't be that much of a hassle but one has to do the work, once. You could look at how HexaPDF does it.
Thanks @gettalong for the guidance. I think I managed to add the required metadata to my PDF using Prawn's info method. Using a tool called mdls I can verify that the metadata is now indeed present in the PDF.
My Copilot now suggests that I use the combine_pdf to add the XMP metadata to the file. But do I really need another gem here? Or is there a better way to achieve this?
No, the info
-method just adds the standard meta information. What you need is to add a metadata stream with the correct PDF/A metadata. Even if mdls shows the metadata, it probabaly just shows the one from the info dictionary and not the metadata stream.
combine_pdf is not needed since you just need to attach files to the PDF and this can be done with Prawn itself.
@gettalong, cool, so I can get along without another gem here.
This is a rough idea of my current code:
class DocumentPdf < Prawn::Document
def initialize(document)
@document = document
super(
:page_size => "A4",
:margin => [32.mm, 20.mm, 40.mm, 25.mm]
)
setup_colors
setup_fonts
setup_layout
add_metadata
add_output_intent
add_xmp_metadata
end
private
def add_metadata
self.info[:Title] = @document.title || "Document"
self.info[:Author] = @document.author || "Author"
self.info[:Subject] = @document.subject || "Subject"
self.info[:Keywords] = @document.keywords || "Keywords"
self.info[:Creator] = "Prawn PDF"
self.info[:Producer] = "Prawn PDF"
self.info[:CreationDate] = Time.now
self.info[:ModDate] = Time.now
end
def add_output_intent
icc_profile_path = Rails.root.join("app", "assets", "icc_profiles", "sRGB.icc")
output_intent = {
S: :GTS_PDFA1,
OutputConditionIdentifier: "sRGB",
Info: "sRGB IEC61966-2.1",
DestOutputProfile: IO.binread(icc_profile_path)
}
catalog.data[:OutputIntents] = [output_intent]
end
def add_xmp_metadata
xmp_metadata = <<-XMP
<x:xmpmeta xmlns:x="adobe:ns:meta/">
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description rdf:about=""
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:xmp="http://ns.adobe.com/xap/1.0/"
xmlns:pdf="http://ns.adobe.com/pdf/1.3/"
xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/">
<dc:title>
<rdf:Alt>
<rdf:li xml:lang="x-default">#{info[:Title]}</rdf:li>
</rdf:Alt>
</dc:title>
<dc:creator>
<rdf:Seq>
<rdf:li>#{info[:Author]}</rdf:li>
</rdf:Seq>
</dc:creator>
<dc:subject>
<rdf:Bag>
<rdf:li>#{info[:Subject]}</rdf:li>
</rdf:Bag>
</dc:subject>
<dc:description>
<rdf:Alt>
<rdf:li xml:lang="x-default">#{info[:Keywords]}</rdf:li>
</rdf:Alt>
</dc:description>
<xmp:CreatorTool>#{info[:Creator]}</xmp:CreatorTool>
<xmp:CreateDate>#{info[:CreationDate].iso8601}</xmp:CreateDate>
<xmp:ModifyDate>#{info[:ModDate].iso8601}</xmp:ModifyDate>
<pdf:Producer>#{info[:Producer]}</pdf:Producer>
<pdfaid:part>3</pdfaid:part>
<pdfaid:conformance>B</pdfaid:conformance>
</rdf:Description>
</rdf:RDF>
</x:xmpmeta>
XMP
metadata_stream = make_xmp_metadata_stream(xmp_metadata)
object_id = state.store(metadata_stream)
state.store.root.data[:Metadata] = PDF::Core::Reference.new(object_id)
end
def make_xmp_metadata_stream(xmp_metadata)
PDF::Core::Stream.new({}, xmp_metadata)
end
end
The problem is that it keeps giving me an error undefined method "info"
no matter what I try.
What am I missing here?
N.b. I haven't had a recent look into the Prawn internals but:
-
The metadata needs to be provided on document creation according to the manual. You can access it later via
doc.state.store.info
which is aPDF::Core::Reference
. -
#add_output_intent
: TheDestOutputProfile
needs to be a stream object that follows the PDF spec according to sections 14.11.5 and 8.6.5.5. From what I see you are just adding it as a string.
Thanks, @gettalong.
Below is my updated code.
class DocumentPdf < Prawn::Document
def initialize(document)
@document = document
super(
:page_size => @paper_size,
:margin => [32.mm, 20.mm, 40.mm, 25.mm],
:info => {
:Title => "Document",
:Author => "Author",
:Subject => "Subject",
:Keywords => "Keywords",
:Creator => "Prawn PDF",
:Producer => "Prawn PDF",
:CreationDate => Time.now,
:ModDate => Time.now
}
)
setup_colors
setup_fonts
setup_layout
add_output_intent
add_xmp_metadata
end
private
def add_output_intent
icc_profile_path = Rails.root.join("app", "assets", "icc_profiles", "sRGB.icc")
icc_profile_data = IO.binread(icc_profile_path)
icc_profile_stream = PDF::Core::Stream.new(icc_profile_data)
output_intent = {
S: :GTS_PDFA1,
OutputConditionIdentifier: "sRGB",
Info: "sRGB IEC61966-2.1",
DestOutputProfile: icc_profile_stream
}
root = state.store.root
root.data[:OutputIntents] = [output_intent]
end
def add_xmp_metadata
xmp_metadata = <<-XMP
<x:xmpmeta xmlns:x="adobe:ns:meta/">
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description rdf:about=""
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:xmp="http://ns.adobe.com/xap/1.0/"
xmlns:pdf="http://ns.adobe.com/pdf/1.3/"
xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/">
<dc:title>
<rdf:Alt>
<rdf:li xml:lang="x-default">This is the Title</rdf:li>
</rdf:Alt>
</dc:title>
<dc:creator>
<rdf:Seq>
<rdf:li>This is the Author</rdf:li>
</rdf:Seq>
</dc:creator>
<dc:subject>
<rdf:Bag>
<rdf:li>This is the Subject</rdf:li>
</rdf:Bag>
</dc:subject>
<dc:description>
<rdf:Alt>
<rdf:li xml:lang="x-default">These are the Keywords</rdf:li>
</rdf:Alt>
</dc:description>
<xmp:CreatorTool>Creator</xmp:CreatorTool>
<xmp:CreateDate>CreateDate</xmp:CreateDate>
<xmp:ModifyDate>ModifyDate</xmp:ModifyDate>
<pdf:Producer>Producer</pdf:Producer>
<pdfaid:part>3</pdfaid:part>
<pdfaid:conformance>B</pdfaid:conformance>
</rdf:Description>
</rdf:RDF>
</x:xmpmeta>
XMP
metadata_stream = make_xmp_metadata_stream(xmp_metadata)
metadata_object = ref!(metadata_stream)
state.store.root.data[:Metadata] = metadata_object
end
def make_xmp_metadata_stream(xmp_metadata)
PDF::Core::Stream.new(xmp_metadata)
end
end
Unfortunately, I am having trouble referencing the metadata in my code via doc.state.store.info
. I keep getting an error undefined local variable or method "info"
. (That's why I hardcoded the values as "This is the Title" etc. for now.)
But, even worse, when I try to render the PDF using send_data(DocumentPdf.new(document).render)
from my controller, I get this error:
PDF::Core::Errors::FailedObjectConversion
This object cannot be serialized to PDF (#<PDF::Core::Stream:0x00000000699498...
What am I missing here?
Generally you don't want to use Stream
directly, it's for internal use only. Instead create an empty dictionary (ref({})
) and use its stream.