NCEAS/metadig-checks

entity.format.nonproprietary should not check `textFormat`

Closed this issue · 7 comments

Description

As mentioned in issue #375, this check potentially looks for invalid XPATHs since formatName is not a child of /eml/dataset/*/physical/dataFormat/textFormat or /eml/dataset/*/physical/dataFormat/binaryRasterFormat

Issues

  • The check looks in invalid paths. Additionally, textFormat is by default non-proprietrary. Not sure how to handle proprietary in binaryRasterFormat, I'm not familiar enough with the types of files in that format space

Procedure

The check should only look in/eml/dataset/*/physical/dataFormat/externallyDefinedFormat/formatName

if it is set to textFormat, it is likely an open format. Evaluating externallyDefinedFormat against a known list of proprietary formats seems like a good approach. We should discuss whether blacklisting or whitelisting is better -- both have their downsides.

@jeanetteclark because the check is looking for formatName, it will only find entries like ./physical/dataFormat/externallyDefinedFormat/formatName, since as you pointed out, formatName isn't a sub-element of textFormat or binaryRasterFormat.

I think the problem is that if a formatName entry isn't found, then the check fails, which is wrong, as it would fail if only textFormats are present, for example.

So, i'm proposing that the check passes if no formatNames are found, or if all formats that are found have a non-proprietary format specified.

@gothub that sounds reasonable to me

@mbjones regarding whether to blacklist or whitelist - the original design was to list all DataONE formats and mark each one as either proprietary or not. Here is the original issue

@jeanetteclark just a clarification of my comments from above:

  • if no <dataFormat> entries are found at all, then the check fails.
  • if any <dataFormat> entries are found - i.e one or more <textFormat>, <binaryRasterFormat>, <externallyDefinedFormat>, and no proprietary format names are found, then the check passes.
    • Note that, for example, this could be a single <textFormat> entry.
  • if any <externallyDefinedFormat> entries exist, then all these must be non-proprietary, or the check fails.

Does that sound reasonable?

yeah that sounds good to me! Thanks

Check updated in commit a19a206