geneontology/pathways2GO

YeastPathways - Intermediate small molecules shared between reactions

Closed this issue · 11 comments

For YeastPathways models, any small molecules that are outputs of one reaction and also an input of the next reaction should share a single small molecule instance. Example in BRANCHED-CHAIN-AA-SYN-PWY-1 pathway shown here:
image
Reaction ACETOLACTREDUCTOISOM-RXN has output molecule (R)-2,3-dihydroxy-3-methylbutanoate instance that is separate from DIHYDROXYISOVALDEHYDRAT-RXN input (R)-2,3-dihydroxy-3-methylbutanoate. These small molecule instances of the same class can be merged into one instance and connected via a -has_output-> X <-has_input- chain as shown here:
image

The following small molecule classes will explicitly be blocked from sharing instances between reactions:
CHEBI:15378 hydron
CHEBI:15377 water

Tagging @thomaspd for any further clarification.

This has already been coded in branch intermediate-mol-share-instance as work to see if this was easy enough to do in the pathways2go code. Turns out it was easy. The code just needs to be merged into master as part of the YeastPathways load.

ukemi commented

To maintain consistency, can we do this with the Reactome models too? @deustp01

@ukemi Yes! We can definitely do this for Reactome. It's actually currently restricted to YeastPathways right now only because applying this to Reactome breaks one of the pathways2go Reactome tests and I haven't been able to debug why (after spending almost 2 days on it a few months ago).

ukemi commented

Weird because we actually use the outputs and inputs in the rule to infer 'provides_input_for'.

The following small molecule classes will explicitly be blocked from sharing instances between reactions:
CHEBI:15378 hydron
CHEBI:15377 water

This list will probably need to be expanded, e.g., to ATP / ADP / Pi and probably more. A couple of years ago, in a similar dicussion, I think we used the term "currency chemicals" (Larry Hunter's phrase?) for these ubiquitously occurring entities that we do NOT want to be the basis of a causal connection between reactions. But cautiously, because the boundary between "currency" and meaningful shared entities is fuzzy and variable.

ukemi commented

currency chemicals? We tried to distill this with Ben and Alan using frequency of use in Rhea reactions. I still have the list.

But cautiously, because the boundary between "currency" and meaningful shared entities is fuzzy and variable.

Maybe we could combine a fairly extensive currency chemical list with an additional rule that when reaction 2 is asserted to directly follow reaction 1, any outputs of 1 that are inputs of 2 can be used to make "directly provides input for" links between the two reactions even if the chemicals are on the currency chemical list. With a lot of checking to tune the rule and the list to exclude false positive links.

Weird because we actually use the outputs and inputs in the rule to infer 'provides_input_for'.

Yeah, some of this logic may be behind what's failing the test.

ukemi commented

Yes, this must be done cautiously. For example, I am currently working on some metabolic pathways that generate and consume co-factors. If I were including chemicals, I certainly wouldn't want to exclude them.

ukemi commented

@dustine32 I also notice above that you are using the relation 'directly provides input for'. It is my understanding that this is now subsumed by just 'provides input for'.

I think this is done, at least for YP. If there needs to be a change for Reactome, that should probably be a new ticket.

This discussion may also be a useful starting point for thinking about the issue of easily / reliably identifying primary inputs in GO-CAMs and distinguishing them from other inputs as discussed here @ukemi @pgaudet