Implement reaction ingests (Rhea, BioPAX, etc)

Question

Implement reaction ingests (Rhea, BioPAX, etc)

Opened this issue 4 years ago · 6 comments

Note: this should move to a generic kghub repo, keeping here for now

Need a TSV of reaction->participant edges from various sources, in order of priority

Rhea cc @balhoff
Any BioPAX3 export (e.g. Reactome)
Any BioPAX2 export (e.g YeastPathways)
SEED cc @realmarcin
maybe kegg

(we also have a heuristic way of generating these from GO text descriptions but this is outside the scope of this ticket)

The fields would be:

subject (https://w3id.org/biolink/vocab/MolecularActivity)
predicate: (todo: add has-participant to bl)
object (https://w3id.org/biolink/vocab/ChemicalSubstance)
usual provenance properties
stoichiometry: int
direction: One of l->r, r->l, bidirectional, neutral
side: One of l,r

This schema to be added to bl (biolink/biolink-model#478)

The nodes would have all the usual properties. E.g. rhea would provide a description, xrefs

maybe additional node properties like

is balanced: bool
is stereo: bool

I suggest the ingest does not try and normalize the IDs, but leaves the source ID prefixes.

Some sources may have catalysis too - add these as other edge type.

Not of direct relevance to KG-hub, but relevant to @goodb @balhoff, we will also have something like a SPARQL transform that turns this into our standard OWL representation, which can be complex, involving unions, e.g

maleate hydratase activity == 
(catalytic activity 
and has input some ((R)-malate(2-) and has stoichiometry value “1”)
and has output some (maleate(2-) and has stoichiometry value “1”)
and has output some (water and has stoichiometry value “1”))
or
(catalytic activity 
and has output some ((R)-malate(2-) and has stoichiometry value “1”)
and has input some (maleate(2-) and has stoichiometry value “1”)
and has input some (water and has stoichiometry value “1”))

This is what we would use for OWL reasoning and in GO

Note this kind of alternate levels of representation for different purposes is exactly what I am getting at in Biological Knowledge Graph Modeling Design Patterns

We can also see this akin to dosdp templating - we have a simple TSV representation and an OWL expansion

Answer 1 · 2020-10-13T16:51:49.000Z

What about adding CHEBI?

In PheKnowLator, we have created specific triples that allow us to explicitly represent CHEBI chemicals, catalysts, and cofactors with respect to Reactome pathways. Ignacio Tripodi and I collaborated on validating this and ran some wet lab experiments that seemed to suggest this worked well when applied to a small human RNA-Seq time series toxicogenomics assay.

Also, I AM HUGE fan of exploring different KG modeling design patterns. Perhaps after the PheKnowLator manuscript we can talk more seriously about some projects in that domain.

Answer 2 · 2020-10-14T22:57:42.000Z

yes, we should definitely add chebi. rhea and reactome already use chebi

curious - how did you go about modeling this?

wow that's amazing about validating on wet lab experiments

Answer 3 · 2020-10-19T16:10:05.000Z

What about adding CHEBI?

We actually ingest CHEBI now in KG-COVID-19 (see here), although probably not as elaborately as what you describe for PheKnowLator

Also, I AM HUGE fan of exploring different KG modeling design patterns. Perhaps after the PheKnowLator manuscript we can talk more seriously about some projects in that domain.

Yes, let's discuss, post-manuscript!

Answer 4 · 2020-10-19T16:41:50.000Z

Yes, we do add ChEBI to our KG.
But I am not sure if we have any sources that references ChEBI apart from ChEMBL.

@callahantiff Would love to include what you have for PheKnowLator or find ways of subsetting specific parts.

Happy to chat more on this when you are ready 👍

Answer 5 · 2020-10-20T14:41:36.000Z

yes, we should definitely add chebi. rhea and reactome already use chebi

curious - how did you go about modeling this?

wow that's amazing about validating on wet lab experiments

Happy to discuss that. I think you might be disappointed by how simple it ended up being in the end. How can I best answer the modeling question? I can describe the edge types/data sources we used?

Small wet lab experiments, but some nonetheless. I'd love to do more. Thoroughly validating the content and relationships in a large heterogeneous KG (aside from using reasoners -- to at least cover some of the logical aspects) is a tough!

Answer 6 · 2021-04-05T17:12:03.000Z

Sorry, slightly non-sequitur here, but just want to mention that a "Knowledge Beacon" was built to access Rhea. It is still quietly running on the Translator subnet at https://kba.ncats.io/beacon/rhea/. It probably didn't adequately cover Rhea but it could be a source of inspiration or a few Python code snippets (or not?)