pdf-association/arlington-pdf-model

fn:InNameTree usage inconsistent with docs

plaisted opened this issue · 2 comments

The InNameTree predicate seems to be used in several cases that don't match the documentation for the predicate. Primary discrepancy is it is used with non-string values, and that it appears to also check if something exists as a value in the name tree rather than the key as an index in the tree.

fn:InNameTree documentation:

  • key is a reference to a PDF name-tree which use PDF strings as indices. Names trees are complex PDF data structures that use strings as indices.
  • Asserts that the current row (key or array element) and which must be a PDF string exists in the specified name-tree.
  • Note that this predicate is not for use with dictionaries that support arbitrary key names or number-trees!

Inconsistent usage:

  • ArrayOfIndirectFileSpecifications.tsv - [fn:InNameTree(parent::RichMediaContent::Assets)] (special case) for catch all values - Type for the row is dictionary, seems this means the current row value must be present as a value in the tree instead of the key being present as an index.
  • PageObject.tsv - fn:IsRequired(fn:InNameTree(trailer::Catalog::Names::Pages) || fn:InNameTree(trailer::Catalog::Names::Templates)) (required values) - Type for row is name, seems this tests if the parent object of the current row (page) is present as a value in the tree, then the field is required.
  • Target.tsv - fn:IsRequired((@R==C) && fn:InNameTree(trailer::Catalog::Names::EmbeddedFiles)) - Type for row is string, but this doesn't make sense for a required value test since you can only evaluate the condition if the value exists, not sure what the value it's meaning to test on is, haven't compared to pdf spec yet.

It may be useful to add additional predicates here or at least break out scenarios further in InNameTree docs:

  • InNameTree(treeReference, key)
  • InNameTree(treeReference) -> current row value is implicitly used as key
  • InNameTreeValues(treeReference, value) -> checks for values rather than indices / keys
  • InNameTreeValues(treeReference) -> -> current row value is implicitly used as value

Sorry for my slow reply... and I completely agree with the issue as you report.

My current thinking is to go with 2 distinct predicates such as fn:InNameTreeValues(...) and fn: InNameTreeIndex(..) where "index" means the string that is looked up in the name-tree (and must be a string!) and "value" is the object that gets indexed (which itself may be a string object or any other kind of object). And, for simplicity, explicitly repeating the current row key name / array index won't hurt as it also means extracted data is more standalone-ish.

Makes sense to me...

I also had a few other questions / comments regarding IsPresent and Special Cases I added here: #38 (comment)

I can create a separate issue if that's better.