llrs/BioCor

Check that using list work

Closed this issue · 3 comments

llrs commented

I got a strange error about a list not being character. I was using mclusterGeneSim perhaps it was using the function for GeneSetCollection.

The input was:

set.seed(456)
# info
library("reactome.db")
#> Loading required package: AnnotationDbi
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#> Loading required package: parallel
#> 
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:parallel':
#> 
#>     clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
#>     clusterExport, clusterMap, parApply, parCapply, parLapply,
#>     parLapplyLB, parRapply, parSapply, parSapplyLB
#> The following objects are masked from 'package:stats':
#> 
#>     IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#> 
#>     anyDuplicated, append, as.data.frame, basename, cbind,
#>     colMeans, colnames, colSums, dirname, do.call, duplicated,
#>     eval, evalq, Filter, Find, get, grep, grepl, intersect,
#>     is.unsorted, lapply, lengths, Map, mapply, match, mget, order,
#>     paste, pmax, pmax.int, pmin, pmin.int, Position, rank, rbind,
#>     Reduce, rowMeans, rownames, rowSums, sapply, setdiff, sort,
#>     table, tapply, union, unique, unsplit, which, which.max,
#>     which.min
#> Loading required package: Biobase
#> Welcome to Bioconductor
#> 
#>     Vignettes contain introductory material; view with
#>     'browseVignettes()'. To cite Bioconductor, see
#>     'citation("Biobase")', and for packages 'citation("pkgname")'.
#> Loading required package: IRanges
#> Loading required package: S4Vectors
#> 
#> Attaching package: 'S4Vectors'
#> The following object is masked from 'package:base':
#> 
#>     expand.grid
library("BioCor")
#> If you use BioCor in published research, please cite:
genes2Pathways <- as.list(reactomeEXTID2PATHID)
pathways <- unlist(genes2Pathways, use.names = FALSE)
genes <- rep(names(genes2Pathways), lengths(genes2Pathways))
paths2genes <- split(genes, pathways)
human <- grep("R-HSA-", names(paths2genes))
paths2genes <- paths2genes[human]
paths2genes <- lapply(paths2genes, unique)
paths2genes <- paths2genes[lengths(paths2genes) >= 2]
genes2paths <- GSEAdv:::inverseList(paths2genes)

# clusters
clusters <- list(a=sample(genes, 50), b = sample(genes, 25))
mclusterGeneSim(clusters, info = genes2paths, method = c("max", "BMA"))
#> Warning in mclusterGeneSim(clusters, info = genes2paths, method =
#> c("max", : Some genes are not in the list provided.
#> Error in if (is.na(rowIds) || is.na(colIds)) {: missing value where TRUE/FALSE needed
mclusterGeneSim(clusters, info = paths2genes, method = c("max", "BMA"))
#> Warning in mclusterGeneSim(clusters, info = paths2genes, method =
#> c("max", : Some genes are not in the list provided.
#> Error in mpathSim(pathwaysl, info, NULL): The input pathways should be characters

Created on 2018-11-15 by the reprex package (v0.2.1)

llrs commented

In an unexpected turn of the events using GeneSetCollections doesn't work!!

gsc <- GSEAdv::as.GeneSetCollection(genes2paths)
o3 <- mclusterSim(clusters, info = gsc, method = "max")
#> Error in keep[keep] : invalid subscript type 'list'
gsc2 <- GSEAdv::as.GeneSetCollection(paths2genes)
o3 <- mclusterSim(clusters, info = gsc2, method = "max")
#> Warning message:
#> In mclusterSim(clusters, info = gsc2, method = "max") :
#>   At least one gene should be in the GeneSetCollection provided

As if the expected gsc is the other way around, genes and the pathways names!!

llrs commented

In the server the above code works, so it might be something of the installation on this specific machine

llrs commented

The first bug about a missing value is related to getting NULL, so I added a check for NULL in the conditions (Although I should explore more)

The second error was due to using unique(keepPaths[keepPaths]) instead of any(keepPaths) as it does now. It produces an incorrect subseting of the lists