Use case: use CAM-KP-API to enhance edges

Question

Use case: use CAM-KP-API to enhance edges

gaurav opened this issue 2 years ago · 9 comments

gaurav commented 2 years ago

Given an edge, can CAM-KP API provide additional information on that edge, including:

Which Noctua/Reactome pathways includes that edge
Where in the body/cell does this pathway take place
...

Example: chemical-gene or gene-gene edge

Answer 1 · 2022-07-14T17:44:02.000Z

We should probably try to get this to work before #537.

The hard part is to find some gene pairs that aren't working but should work, so perhaps what we need is a test file that's a list of genes and then we query them to see if we get the expected relationship.

Might be useful to add some exploration endpoints that are easier to work with (e.g. an endpoint that returns a list of models for a particular gene).

Question: can we say gene A and gene B are related if they are in the same model? Should we implement that?

Since we don't have that, we need to find specific relations for this task.

"Causes influences" could be the relation between two genes that tells if they are related to each other within a model. This is a broad match of biolink:causes, but we only use exact matches, so that might not be accessible from CAM-KP-API. However, there is a set of manual mappings in https://github.com/ExposuresProvider/cam-pipeline/blob/cc13ef6ac7f4d48e91f77a789c71dec344512e1b/biolink-local.ttl that we might be able to access.

Answer 2 · 2022-07-14T17:59:38.000Z

When testing TRAPI queries, we will need to make sure the RO relation we're inferring maps to a reasonable Biolink relation. Something confusing is that folks may search for causes but some relevant relations map to affects.

Answer 3 · 2022-07-19T18:58:15.000Z

Here are two different ARAX queries that you can pull gene-chemical edges from, as described on slide 8 in this deck:

https://arax.ncats.io/?r=44679
https://arax.ncats.io/?r=52713

Answer 4 · 2022-10-05T15:38:02.000Z

Sorry it's taken me so long to respond to this! These queries were super helpful in helping us find and fix some bugs in CAM-KP, and I think there might be more bugs lurking there. Here are my results.

As far as I can tell, out of all the edges @karafecho provides to us, only the edge between UniProtKB:P51589 and UniProtKB:P08684 returns results with a one-hop query. This is the following query:

{"message":{"query_graph":{"nodes":{"n0":{"ids":["UniProtKB:P51589"]},"n1":{"ids":["UniProtKB:P08684"]}},"edges":{"e0":{"predicates":["biolink:related_to"],"subject":"n0","object":"n1"}}}}}

Running this on our development instance returns 960 results, all of them being biolink:affects_activity_of edges from the model http://model.geneontology.org/R-HSA-5423646. I'm not sure why there are so many results, but I'm going to dig into this further to see what's going on here.

Two-hop queries do a bit better, with:

360 results for CHEBI:34477-(?)-UniProtKB:P08684
144 results for CHEBI:63840-(?)-UniProtKB:P08684
- This has some interesting results, e.g. CHEBI:63840("5'-hydroxyomeprazole") biolink:participates_in GO:0006739 ("NADP metabolic process") biolink:caused_by NCBIGene:100861540
1000+ results for (CHEBI:17996 or CHEBI:23114)-(?)-UniProtKB:P13569
1000+ results for UniProtKB:O75795-(?)-UniProtKB:P08684
1000+ results for UniProtKB:P16662-(?)-UniProtKB:P08684
1000+ results for UniProtKB:P19224-(?)-UniProtKB:P08684
1000+ results for UniProtKB:P22310-(?)-UniProtKB:P08684
1000+ results for UniProtKB:P54855-(?)-UniProtKB:P08684
1000+ results for UniProtKB:P24462-(?)-UniProtKB:P08684
1000+ results for UniProtKB:Q9HB55-(?)-UniProtKB:P08684
1000+ results for CHEBI:35703-(?)-UniProtKB:P08684

I used the query:

{"message":{"query_graph":{"nodes":{"n0":{"ids":["CHEBI:17996","CHEBI:23114"]},"n1":{},"n2":{"ids":["UniProtKB:P13569"]}},"edges":{"e0":{"predicates":["biolink:related_to"],"subject":"n0","object":"n1"},"e1":{"predicates":["biolink:related_to"],"subject":"n1","object":"n2"}}}}}

As you can see, UniProtKB:P08684 seems to be quite overrepresented in the results, and again it seems to me that we're seeing a lot more results than I would expect to see here.

I wonder if maybe we shouldn't need to do multihop queries to get these results -- whether we should have some related_to triples connecting entities that have any relation with each other.

So, I think, next steps:

Dig into the one-hop results and figure out what's going on there.
Dig into the first two two-hop result sets, figure out if there's anything interesting in there, and if we should change our triplestore so that you can get these results with a one-hop query.

Answer 5 · 2022-10-14T15:52:54.000Z

Thanks for your work on this, Gaurav.

The two-hop results indeed do look interesting, although I have not completed a deep dive.

Answer 6 · 2022-10-14T17:32:08.000Z

Note: updated TCDC workflow can be found in slide 10 in this deck.

Answer 7 · 2022-11-01T15:33:22.000Z

Any updates, Gaurav? Happy to help if you point me in the right direction.

Answer 8 · 2022-11-02T15:24:17.000Z

Hi Kara! My work on this issue currently revolves around the new /lookup endpoint (#572): my goal is to have an endpoint that (1) normalizes input identifiers and (2) goes around the main SPARQL query we are currently using to query the triplestore directly to return everything we know about a particular identifier, in order to check whether the main SPARQL query is working correctly. This is primarily intended for debugging right now, but once that's done, I want to provide the ability to filter by an object as well -- so we should have an API endpoint that would allow you to query e.g. /lookup?subject=CHEBI:17685&object=GO:0019136&hopLimit=10 to find every relation between CHEBI:17685 and GO:0019136 across up to ten hops after normalizing both of those identifiers. I think that'll give us everything we need to enhance edges and double-check our SPARQL queries at the same time. I've gotten sidetracked by some database issues, but I'm hoping to have the basic /lookup endpoint up by early next week, with support for filtering by an object identifier added soon thereafter. Happy to discuss any of this in a meeting if that would be useful!

Answer 9 · 2022-11-04T19:49:55.000Z

This is all sounds great, Gaurav! I very much appreciate the effort.