geneontology/pathways2GO

Emit correct datatype for date field (and maybe other fields)

Opened this issue · 4 comments

From the 2023-09-14 Noctua maintenance, we loaded YeastPathways models to production and noticed that only 10 of the 220 models, which were all dated that day, would sort to the top of the Noctua landing page. The remaining 210 models did not appear to be sorted correctly.

@kltm pointed out that there was a format difference in the date field for these YeastPathways models vs. other (e.g., manual) models:

$ grep -A 1 date models/YeastPathways_TREDEG-YEAST-PWY.ttl
        <http://purl.org/dc/elements/1.1/date>
                "2023-09-14" ;
$ grep date models/65039e8700000010.ttl
        <http://purl.org/dc/elements/1.1/date> "2023-09-14"^^<http://www.w3.org/2001/XMLSchema#string> .

I tested this by loading a YeastPathways model file into a local minerva instance and checking the the landing page where it did not sort to the top despite being the most recent dated. Then, I sed-appended the ^^xsd:string to every date value (so, "2023-09-14"^^xsd:string) in the model, reloaded, and this edited model sorted correctly to the top.

Apparently, for whatever reason, the sorting mechanism in either minerva or the Noctua landing page doesn't like this lack of a ^^xsd:string datatype. So, we should try to get the conversion code to emit this for date. Possibly, look at what other fields are easy to fix as well.

Still, it is weird the first 10 models sorted correctly even though they also had the bad datatype formatting.

kltm commented

@dustine32 It would be on the minerva side, as the over-the-wire model has no concept of type in that way. I suspect it's just some quirk of SPARQL dealing with a "mixed-type" sort that gives the strange results...

@kltm So, if this is in minerva, should I still attempt to fix the YeastPathways date values in the pathways2GO code? I've spent some time perusing P2GO and minerva code along with owlapi docs and I'm struggling to get the conversion code to add those little ^^xsd:string suffixes to dates. If desperate, we could consider a post-process sed to just insert them into each TTL file.

kltm commented

@dustine32 By "on the minerva side", I'm meaning where the search and sort code are having trouble.
@balhoff May have a different take on this, but I'm pretty neutral about a fix as this is currently a one-time thing. If we are planning on reusing that code path, the fix should go into the pathways2GO code; if this is a one-off, sed is fine.

@kltm Right! Thank you for clarifying. I agree that we should probably get pathways2GO code to emit correctly since it will be used in the future for more models. Speaking of which, I'm now talking to the evil chatbot and it seems to know some OWLAPI things.