ExposuresProvider/cam-pipeline

Update the URL for the wiki page returned by cam-kp

Closed this issue · 10 comments

Per discussion between Kara and Jim, this issue is to request that the URL for the wiki page returned by cam-kp is replaced with this one: https://github.com/NCATSTranslator/Translator-All/wiki/CAM-Provider-KG. This change will necessitate some changes to the code and a new deployment, but it will improve consistency, adhere to NCATS/UI specs for wiki pages, and better serve end users

Am I correct in thinking that this will need to be made in the infores catalog? We currently follow the convention of all the Automat KGs, which sets all the Automat KG xrefs to https://github.com/NCATSTranslator/Translator-All/wiki/Automat, but I think we can change that to https://github.com/NCATSTranslator/Translator-All/wiki/CAM-Provider-KG without too much problem.

https://github.com/biolink/biolink-model/blob/569ecf63ae59bfd200dda8dd871ed50c2dff4345/infores_catalog.yaml#L167-L174

Our /metadata endpoint returns two URLs at https://automat.renci.org/cam-kp/metadata:

      "source_data_url": "https://github.com/ExposuresProvider/cam-kp-api",
      "license": "https://github.com/ExposuresProvider/cam-kp-api/blob/master/LICENSE",
      "attribution": "https://github.com/ExposuresProvider/cam-kp-api",

AFAIK nobody uses these anywhere, but we could consider changing attribution to this URL as well.

To clarify, we (icees-kg) also follow the convention of deferring to Automat as an aggregator knowledge source (infores:automat-icees-kg) and point to https://github.com/NCATSTranslator/Translator-All/wiki/Automat, but we then refer to infores:icees-kg as the primary knowledge source and point to https://github.com/NCATSTranslator/Translator-All/wiki/ICEES. For cam-kp, I'm suggesting that you also refer to infores:automat-cam-kp as the aggregator knowledge source pointing to https://github.com/NCATSTranslator/Translator-All/wiki/Automat and infores:cam-kp as the primary knowledge source, pointing to https://github.com/NCATSTranslator/Translator-All/wiki/CAM-Provider-KG.

Yes, the change will need to be made in both cam-kp and the infores catalog. However, I already have them flagged as part of the infores/wiki effort, so I can create a PR for both icees-kg and cam-kp after the cam-kp URLs have been updated.

Ah, got it, I understand what you mean now! So instead of our current sources, which looks like this:

cam-pipeline/tests/test_api.py

Lines 2433 to 2444 in c539c74

assert spinal_cord_edge["sources"] == [
{
"resource_id": "infores:go-cam",
"resource_role": "primary_knowledge_source",
"upstream_resource_ids": None,
},
{
"resource_id": "infores:automat-cam-kp",
"resource_role": "aggregator_knowledge_source",
"upstream_resource_ids": ["infores:go-cam"],
},
]

You are proposing that we add infores:cam-kp as the primary_knowledge_source below infores:automat-cam-kp as the aggregator_knowledge_source, and then demote infores:go-cam to a supporting_data_source.

I think we can do that, but I would argue that infores:cam-kp should also be an aggregator_knowledge_source -- we don't really provide any primary knowledge, and all the information we have should be sourced to one of our primary knowledge sources (infores:go-cam, infores:aop-cam and infores:ctd):

# Check the QC results.
assert set(metadata["qc_results"]["primary_knowledge_sources"]) == {
"infores:ctd",
"infores:aop-cam",
"infores:go-cam",
}

So I would propose that we change our sources so they look like this:

[ 
     { 
         "resource_id": "infores:go-cam", 
         "resource_role": "primary_knowledge_source"
     }, 
     { 
         "resource_id": "infores:cam-kp", 
         "resource_role": "aggregator_knowledge_source", 
         "upstream_resource_ids": ["infores:go-cam"], 
     }, 
     { 
         "resource_id": "infores:automat-cam-kp", 
         "resource_role": "aggregator_knowledge_source", 
         "upstream_resource_ids": ["infores:cam-kp"], 
     }, 
 ] 

Does that make sense?

Yes, let's go with you suggestion, but note that you may want to cross-check against the InfoRes catalog (https://github.com/biolink/biolink-model/blob/master/infores_catalog.yaml).

@EvanDietzMorris updated Automat-CAM-KP; if you run the following query on https://automat.renci.org/#/cam-kp/reasoner_api_1_4_query_post_cam-kp_trapi:

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "ids": ["NCBIGene:15481"]
                },
                "n1": {"categories": ["biolink:AnatomicalEntity"]}
            },
            "edges": {
                "e0": {
                    "subject": "n0",
                    "object": "n1",
                    "predicates": ["biolink:active_in"]
                }
            }
        }
    }
}

... you get back provenance that looks like this:

"390329": {
          "predicate": "biolink:active_in",
          "sources": [
            {
              "resource_id": "infores:go-cam",
              "resource_role": "primary_knowledge_source"
            },
            {
              "resource_id": "infores:cam-kp",
              "resource_role": "aggregator_knowledge_source",
              "upstream_resource_ids": [
                "infores:go-cam"
              ]
            },
            {
              "resource_id": "infores:automat-cam-kp",
              "resource_role": "aggregator_knowledge_source",
              "upstream_resource_ids": [
                "infores:cam-kp"
              ]
            }
          ],
          "subject": "NCBIGene:15481",
          "attributes": [
            {
              "attribute_type_id": "biolink:xref",
              "original_attribute_name": "xref",
              "value": [
                "http://model.geneontology.org/SYNGO_2867"
              ],
              "value_type_id": "xsd:anyURI"
            }
          ],
          "object": "UBERON:0002894"
        }

So infores:go-cam is the primary knowledge source, which is aggregated by the aggregator knowledge source infores:cam-kp, which is itself aggregated by the aggregator knowledge source infores:automat-cam-kp.

@karafecho: Is this sufficient to close this issue, at least on the Automat-CAM-KP side? We'll have to make sure that all of those inforeses point to the right URLs, but according to https://github.com/biolink/information-resource-registry/blob/e592279814e723ca16b922111037568171b87668/infores_catalog.yaml:

So we should be good there.

This all looks good to me! Sierra, Tursynay, and I just updated the infores catalog (created and merged a large PR with many changes), so your x-refs are up to date and look good to me. As such, I'll close this ticket.

Oh, wait. I don't see a GO-CAM wiki page?

Hmm, the correct wiki URL for GO-CAM should be https://github.com/NCATSTranslator/Translator-All/wiki/GO-CAM — I think maybe the dash got turned into an em-dash in https://github.com/biolink/information-resource-registry/blob/e592279814e723ca16b922111037568171b87668/infores_catalog.yaml. I can fix that in a bit.

Yes, that is the correct xref URL for GO-CAM. I would create a PR to change the incorrect xref URL in the infores catalog.

PR created! biolink/information-resource-registry#7

We can close this ticket once that's been merged.