Call for review metric: Gen2_FM_F3.md
Opened this issue · 9 comments
please review this new 2nd gen metric
The test for schema:mainEntity is not valid. mainEntity points to a block of metadata that DOES NOT necessarily contain the identifier.
A better test would be mainEntity -> identifier (or one of the subclasses: accountId
confirmationNumber
duns
flightNumber
globalLocationNumber
gtin12
gtin13
gtin14
gtin8
isbn
issn
legislationIdentifier
leiCode
orderNumber
productID
serialNumber
sku
taxID)
@markwilkinson Hi Mark. We register our dataset DOIs with DataCite and pass to DataCite various dataset metadata at time of registration (currently using schema.org predicates). We do not pass to DataCite any of the predicates this metric is searching for, and thus our records are failing this metric. Where did you get the list of valid and required predicates for the dataset type of objects? I am looking in the Nature Sci Data guidance in https://www.nature.com/articles/s41597-019-0031-8.pdf and don't see the the schema.org predicates tested by this metric in their example dataset metadata or explicitly mentioned in their recommendations @jlbales
Hi Dan,
Anything that has a DOI should pass this test! There may be something else failing... can you send me an example of a DOI you find is failing this test?
That article makes good suggestions, and the test follows those suggestions (and more!). Unfortunately, there is no such thing as a 'list of valid predicates', since nobody has the authority to say what is 'valid'. As such, my list comes from a survey of what people are using "in the real world". I make no claim to validity... I only claim that, based on usage, an agent that was looking for data would usually be able to find it if it looked for a predicate on that list.
Please send me an example of what you are seeing, and I will try to troubleshoot the test.
Cheers!
@markwilkinson Sure: see https://fairsharing.github.io/FAIR-Evaluator-FrontEnd/#!/evaluations/5118 If you look at the test for F3 output, the last part says:
FAILURE: Was unable to locate the data identifier in the metadata using any (common) property/predicate reserved for this purpose. Tested the following ["http://www.w3.org/ns/ldp#contains", "http://xmlns.com/foaf/0.1/primaryTopic", "http://purl.obolibrary.org/obo/IAO_0000136", "http://purl.obolibrary.org/obo/IAO:0000136", "https://www.w3.org/ns/ldp#contains", "https://xmlns.com/foaf/0.1/primaryTopic", "http://schema.org/mainEntity", "http://schema.org/codeRepository", "http://schema.org/distribution", "https://schema.org/mainEntity", "https://schema.org/codeRepository", "https://schema.org/distribution", "http://www.w3.org/ns/dcat#distribution", "https://www.w3.org/ns/dcat#distribution", "http://www.w3.org/ns/dcat#dataset", "https://www.w3.org/ns/dcat#dataset", "http://www.w3.org/ns/dcat#downloadURL", "https://www.w3.org/ns/dcat#downloadURL", "http://www.w3.org/ns/dcat#accessURL", "https://www.w3.org/ns/dcat#accessURL", "http://semanticscience.org/resource/SIO_000332", "http://semanticscience.org/resource/is-about", "https://semanticscience.org/resource/SIO_000332", "https://semanticscience.org/resource/is-about", "https://purl.obolibrary.org/obo/IAO_0000136"]
That is the list of predicates I was referring to as being checked. We lack embedded metadata on our page and it looks from this output like we need to use at least one of those predicates when we do embed the metadata for the DOI and the DOI itself on the page.
Yes, I see. you're injecting data/metadata via script, and the DOI provider has no information at all.
Unfortunately, there's not much I can do to resolve this problem... I'm not inclined to train my harvester to run scripts, since it explores arbitrary pages and isn't in such a protected space as a browser.
Note that the predicates it is searching for (the list you copy/paste above) are the predicates that point at the data (your CEL.gz records on that page). The DOI, which should also appear somewhere in the page, would require a different predicate (likely schema:identifier or dc:identifier)
Sorry I can't help more!
@markwilkinson Ugh, I should have explained before asking you to take a look. Yes, we have not yet embedded our metadata on the dataset landing page, but we are planning to do that very soon. DataCite, the DOI provider, DOES in fact have the metadata for the dataset associated with this DOI (you can see it here: https://api.datacite.org/dois/application/vnd.datacite.datacite+json/10.26030/cwan-7h58 ), but our choice of schema.org predicates that we currently give to DataCite doesn't include any from the list that the output from the test of this F3 metric says it is looking for (e.g., we don't have schema.org:mainEntity predicate). What I was asking was if the listing of the predicates in the output of the test is from set of published predicates required to pass F3-based tests.
Interesting... it looks like several of the DataCite content types are not responding at the moment - if you request turtle or rdf/xml, it fails, but if you request json-ld it succeeds. that's why I thought it wasn't providing any metadata at all!
Yes, if you're using schema, then mainEntity is one of the few choices (there are other choices for e.g. code repositories, but not for data)
Cheers!
@markwilkinson Where did you get the list of predicates this metric test is testing? ...can you provide the reference? I don't see schema.org:mainEntity in the Nature citation roadmap paper for any types including dataset types....