I do not get all "hsa" compounds from KEGG

Question

I do not get all "hsa" compounds from KEGG

Francisco-madrid-gambin opened this issue 3 years ago · 3 comments

Francisco-madrid-gambin commented 3 years ago

Hi,

I run:
graph <- buildGraphFromKEGGREST(organism = "hsa") buildDataFromGraph(keggdata.graph = graph, databaseDir = tmpdir, normality = "diffusion", niter = 50)
fella.data <- loadKEGGdata(databaseDir = tmpdir,)

But it does not find endocannabinoids in the diffusion matrix:
AEA <- c("C11695", "C13856") myAnalysis <- defineCompounds(compounds = AEA, data = fella.data)

Error in defineCompounds(compounds = AEA, data = fella.data) : None of the specified compounds appear in the available KEGG data.

However, they are here:
https://www.genome.jp/entry/C11695+C13856

Am I missing something?

Tnx

Answer 1 · 2021-12-05T11:29:39.000Z

Hi Francisco, let me have a look. Not all the KEGG compounds appear in the FELLA graph, since there is a pre-filtering step to appear in the organism-specific pathways. But the ones you specify seem to map to human pathways.

Answer 2 · 2021-12-16T11:44:59.000Z

That would be great!
In addition, is it possible to get also connected genes in the network? Or is it only about metabolites and pathways?

Answer 3 · 2021-12-17T09:11:22.000Z

Hi Francisco, answer to your first question

# https://github.com/b2slab/FELLA/issues/20
# question: compounds that belong to a human pathway do not map to the FELLA network, why?
# short answer: they are not linked to a reaction that is found in a human pathway, and FELLA filters them out 
# long answer: see below. Also, some info at https://doi.org/10.1371/journal.pone.0189012.s004
library(igraph)
library(FELLA)
library(dplyr)

tmpdir <- tempdir()

graph <- buildGraphFromKEGGREST(organism = "hsa") 
buildDataFromGraph(keggdata.graph = graph, databaseDir = tmpdir, matrices = "none", normality = "none", niter = 50)
fella.data <- loadKEGGdata(databaseDir = tmpdir)

AEA <- c("C11695", "C13856") 
try(myAnalysis <- defineCompounds(compounds = AEA, data = fella.data))

# cpds not in FELAL graph
v.cpd.g <- getCom(fella.data, "compound")
any(AEA %in% v.cpd.g)

# pathways in the FELLA graph
v.path.g <- getCom(fella.data, "pathway")

# get KEGG pathways
df.path <- KEGGREST::keggLink("compound", "pathway")

# quick fix: map -> hsa (but be aware that not all maps exist as hsa pathways)
df.path <- data.frame(
  path = gsub("path:map", "hsa", names(df.path)), 
  cpd = gsub("cpd:", "", df.path), 
  stringsAsFactors = FALSE
)

v.path.kegg <- KEGGREST::keggList("pathway", "hsa")
names(v.path.kegg) <- gsub("path:", "", names(v.path.kegg))

# are the AEA compounds there? Yes
df.aea <- subset(df.path, cpd %in% AEA & path %in% v.path.g)
df.aea
v.path.aea <- unique(df.aea$path)

# so, why are they discarded by FELLA?
# see https://github.com/b2slab/FELLA/blob/master/R/buildGraphFromKEGGREST.R#L358
# # Keep only compounds that are reactants/products in these reactions
# # i.e. delete compounds that don't have any 1-weight edge
# 
# do they have any reaction?
df.rx <- KEGGREST::keggLink("compound", "reaction")
df.rx <- data.frame(
  rx = gsub("rn:", "", names(df.rx)), 
  cpd = gsub("cpd:", "", df.rx), 
  stringsAsFactors = FALSE
)

# C11695 has one: R09536
# but... it does not belong to a human pathway
# see https://github.com/b2slab/FELLA/blob/master/R/buildGraphFromKEGGREST.R#L351
# # Keep only reactions in a pathway
# # i.e. delete reactions that don't have any 3-weight edge
df.rx.aea <- subset(df.rx, cpd %in% AEA)
df.rx.aea

df.rxpath <- KEGGREST::keggLink("reaction", "pathway")
df.rxpath <- data.frame(
  path = df.rxpath, 
  rx = gsub("rn:", "", df.rxpath), 
  stringsAsFactors = FALSE
)
# no pathway
subset(df.rxpath, rx %in% df.rx.aea$rx)

Answering your 2nd question, there are gene ensembl ids as node attributes for enzymes. For instance

## First generate a toy enrichment
library(igraph)
data(FELLA.sample)
data(input.sample)

## Enrich input
obj <- enrich(
  compounds = input.sample, 
  data = FELLA.sample)

g.res <- generateResultsGraph(
  method = "pagerank", 
  threshold = 0.1, 
  object = obj, 
  data = FELLA.sample)
g.res

# enzyme nodes have entrez ids
V(g.res)$entrez