pdf-association/arlington-pdf-model

Resources entry in Form XObject is only strongly recommended, not required

bdoubrov opened this issue · 7 comments

PDF 2.0 (Table 93) says that Resources entry in Type1 Form dictionary is Optional but strongly recommended, while the XObjectFormType1.tsv says it is required.

The discussion around Errata pdf-association/pdf-issues#128 has the potential to change this, since now Table 93 and text in subclause 7.8.3 Resource dictionaries is contradictory for certain uses. As we discussed in Paris and depending on final wording, Arlington may need to define 2 different kinds of Form XObjects - those with mandatory Resources and those without. Will wait for the Errata to resolve before addressing this...

Note for future: Linearized PDFs don't inherit up the page tree according to 7.7.3.4. Consider how to capture this...

I think the changes needed are as follows:

  • define a new predicate fn:UsesNamedResources() that only applies to objects/TSVs that can be content streams
  • make a new TSV to represent content streams with an optional(!) Resources key- e.g. ContentStream.tsv
  • make a new TSV to represent an array of content streams (that do NOT have a Resources entry) - e.g. copy ArrayOfStreamsGeneral to ArrayOfContentStreamsGeneral.
    • This change only serves to semantically indicate in the model what can be content streams (vs other arbitrary streams) via the name of the referenced TSV and thus allow checking of the new predicate.
  • change PageObject.tsv Contents from ArrayOfStreamsGeneral to ArrayOfContentStreamsGeneral
  • change CharProcMap.tsv from Stream to ContentStream
  • change XObjectFormType1 Resources from Required=TRUE to fn:IsRequired(fn:SinceVersion(2.0) && fn:UsesNamedResources())
  • change PatternType1.tsv Resources from Required=TRUE to fn:IsRequired(fn:SinceVersion(2.0) && fn:UsesNamedResources())

Note that these changes do NOT attempt to encode the legacy PDF 1.1 and earlier rule that all named resources were on the page object. This is NOT the same as requiring a Resources entry on the Page, since pages without any named resources don't need Resources.

Does that sound correct?

@MaximPlusov - do you agree with the above?

I agree.