NCEAS/metadig-checks

resource.type.valid

jeanetteclark opened this issue · 8 comments

Description

This check will return the resource type as given by:
/*/metadataScope/MD_MetadataScope/resourceScope/MD_ScopeCode (not sure if this one is valid)
/*/hierarchyLevel/MD_ScopeCode/
/resource/ResourceType/@resourceTypeGeneral
/eml/[dataset|citation|software|protocol]

Priority

  • FAIR: Required

Issues

For EML, the check should return the name of the whatever second level element is used for the choice of dataset, citation, software, protocol.

Procedure

This check, instead of returning success or failure, should return the text within the element/attribute (ISO/DataCite), or in the case of EML the name of the element used.

@jeanetteclark How about the name resource.type.supported?

I'd like this check to test if the resource type is found, and if it is, check it against a known list of supported types for each dialect. If it matches then "SUCCESS" is returned.

This check will also be used by metadig-engine to determine if the document should be processed, as seen by the linked issue to metadig-engine repo.

Do we need to create a stub dataset assessment report for pids that contain data that is not of a supported resource type?
For example, what do we display when a user views such a resource via metacatui and clicks on the 'Assessment Report' button?

Hm. I'm not sure what you just described is what we discussed on the call. I thought that we would return the value of the resource type in the check, as opposed to a success/failure. Your method makes sense, but would limit us somewhat in that we wouldn't be able to analyze the data by resource type. @mbjones what are your thoughts?

I agree with your take, @jeanetteclark . It should output the type as a controlled value, and return SUCCESS unless it can't determine the type for some reason

So it should have two return values? the type, and success/failure?

Yes, but they aren't really return values per se. They are different fields in the XML output report -- the status field should indicate SUCCESS and the output field should have the type value. See some of the example reports for the structure.

@jeanetteclark maybe I missed the context of what the check needs to do. Here is a line from Jan 28 meeting notes:

"For FAIR checks, make sure we are looking at dataset resources. If not a dataset, don’t generate a report."

So the check would extract the resource type and compare it against controlled lists. If a match is found, then "SUCCESS" is returned, if not then "FAILURE". So, if no resource type is found, then the resource type is not sufficient.

All checks can return an optional "output" element, which for this case would contain the resource type on "SUCCESS".
An example of current check output, for the check resource.abstractLength.sufficient.1:

The abstract word count of '95' is less that the recommended minimum of '100' 

If you don't want your check to do this, then just let me know, and I'll open up another issue for the type of check described above.

Also, please let me know when you have a check I could run, and I can help test it.

"For FAIR checks, make sure we are looking at dataset resources. If not a dataset, don’t generate a report."

I think this note was more an idea than something we wanted to make sure we implement right away.

I like Matt's suggestion to have the check return SUCCESS if the resource scope is valid (part of a controlled list), and then the output say something like:

The resource type is: foo

I think we can decide what to do with the report and the UI a bit down the road. I'm interested in seeing the report data for non-dataset resources for this analysis, so I want to make sure that is possible.

The new resource.type.valid check compares the metadata resource type against controlled resource type lists from ISO 19115, DataCite 4.1 and EML 2.2.

This check has been verified against these dialects.