b2slab/FELLA

I do not get all "hsa" compounds from KEGG

Francisco-madrid-gambin opened this issue · 3 comments

Hi,

I run:
graph <- buildGraphFromKEGGREST(organism = "hsa") buildDataFromGraph(keggdata.graph = graph, databaseDir = tmpdir, normality = "diffusion", niter = 50)
fella.data <- loadKEGGdata(databaseDir = tmpdir,)

But it does not find endocannabinoids in the diffusion matrix:
AEA <- c("C11695", "C13856") myAnalysis <- defineCompounds(compounds = AEA, data = fella.data)

Error in defineCompounds(compounds = AEA, data = fella.data) : None of the specified compounds appear in the available KEGG data.

However, they are here:
https://www.genome.jp/entry/C11695+C13856

Am I missing something?

Tnx

Hi Francisco, let me have a look. Not all the KEGG compounds appear in the FELLA graph, since there is a pre-filtering step to appear in the organism-specific pathways. But the ones you specify seem to map to human pathways.

That would be great!
In addition, is it possible to get also connected genes in the network? Or is it only about metabolites and pathways?

Hi Francisco, answer to your first question

# https://github.com/b2slab/FELLA/issues/20
# question: compounds that belong to a human pathway do not map to the FELLA network, why?
# short answer: they are not linked to a reaction that is found in a human pathway, and FELLA filters them out 
# long answer: see below. Also, some info at https://doi.org/10.1371/journal.pone.0189012.s004
library(igraph)
library(FELLA)
library(dplyr)

tmpdir <- tempdir()

graph <- buildGraphFromKEGGREST(organism = "hsa") 
buildDataFromGraph(keggdata.graph = graph, databaseDir = tmpdir, matrices = "none", normality = "none", niter = 50)
fella.data <- loadKEGGdata(databaseDir = tmpdir)

AEA <- c("C11695", "C13856") 
try(myAnalysis <- defineCompounds(compounds = AEA, data = fella.data))

# cpds not in FELAL graph
v.cpd.g <- getCom(fella.data, "compound")
any(AEA %in% v.cpd.g)

# pathways in the FELLA graph
v.path.g <- getCom(fella.data, "pathway")

# get KEGG pathways
df.path <- KEGGREST::keggLink("compound", "pathway")

# quick fix: map -> hsa (but be aware that not all maps exist as hsa pathways)
df.path <- data.frame(
  path = gsub("path:map", "hsa", names(df.path)), 
  cpd = gsub("cpd:", "", df.path), 
  stringsAsFactors = FALSE
)

v.path.kegg <- KEGGREST::keggList("pathway", "hsa")
names(v.path.kegg) <- gsub("path:", "", names(v.path.kegg))

# are the AEA compounds there? Yes
df.aea <- subset(df.path, cpd %in% AEA & path %in% v.path.g)
df.aea
v.path.aea <- unique(df.aea$path)

# so, why are they discarded by FELLA?
# see https://github.com/b2slab/FELLA/blob/master/R/buildGraphFromKEGGREST.R#L358
# # Keep only compounds that are reactants/products in these reactions
# # i.e. delete compounds that don't have any 1-weight edge
# 
# do they have any reaction?
df.rx <- KEGGREST::keggLink("compound", "reaction")
df.rx <- data.frame(
  rx = gsub("rn:", "", names(df.rx)), 
  cpd = gsub("cpd:", "", df.rx), 
  stringsAsFactors = FALSE
)

# C11695 has one: R09536
# but... it does not belong to a human pathway
# see https://github.com/b2slab/FELLA/blob/master/R/buildGraphFromKEGGREST.R#L351
# # Keep only reactions in a pathway
# # i.e. delete reactions that don't have any 3-weight edge
df.rx.aea <- subset(df.rx, cpd %in% AEA)
df.rx.aea

df.rxpath <- KEGGREST::keggLink("reaction", "pathway")
df.rxpath <- data.frame(
  path = df.rxpath, 
  rx = gsub("rn:", "", df.rxpath), 
  stringsAsFactors = FALSE
)
# no pathway
subset(df.rxpath, rx %in% df.rx.aea$rx)

Answering your 2nd question, there are gene ensembl ids as node attributes for enzymes. For instance

## First generate a toy enrichment
library(igraph)
data(FELLA.sample)
data(input.sample)

## Enrich input
obj <- enrich(
  compounds = input.sample, 
  data = FELLA.sample)

g.res <- generateResultsGraph(
  method = "pagerank", 
  threshold = 0.1, 
  object = obj, 
  data = FELLA.sample)
g.res

# enzyme nodes have entrez ids
V(g.res)$entrez