caught segfault error on cluster
GossypiumH opened this issue · 2 comments
Hello,
I am trying to do a cross domain analysis on a relatively big microbial dataset.
On one hand I have a bacterial abundance matrix with 7000+ columns (aka species) and 38 samples
And on the other I have a much smaller matrixes with the count of general functions of genes expressed by the bacterial community with ~150 columns and the same amount of samples (for example acidic stress)
I used SPIEC-EASI in my previous studies with amplicon data (much smaller datasets) and I could run those on a laptop without any problem) However I had to use a cluster for this dataset.
However the cluster tell me this error after executing spiec.easi() in my script :
JobID: 3832403
Running on node: amr-204
/mnt/research/gilbert_lab/jean-baptiste_wd/G820/Ranalyses
/mnt/home/flochjea/miniconda3/envs/R_network/bin/R
Attaching package: ‘igraph’
The following object is masked from ‘package:SpiecEasi’:
make_graph
The following objects are masked from ‘package:stats’:
decompose, spectrum
The following object is masked from ‘package:base’:
union
Applying data transformations...
Selecting model with pulsar using bstars...
*** caught segfault ***
address 0x2b34e98f5000, cause 'memory not mapped'
5: do.call(fun, c(fargs, list(data[ind.sample, ])))
6: FUN(X[[i]], ...)
7: eval(expr, env)
8: doTryCatch(return(expr), name, parentenv, handler)
9: tryCatchOne(expr, names, parentenv, handlers[[1L]])
10: tryCatchList(expr, classes, parentenv, handlers)
11: tryCatch(expr, error = function(e) { call <- conditionCall(e) if (!is.null(call)) { if (identical(call[[1L]], quote(doTryCatch))) call <- sys.call(-4L) dcall <- deparse(call, nlines = 1L) prefix <- paste("Error in", dcall, ": ") LONG <- 75L sm <- strsplit(conditionMessage(e), "\n")[[1L]] w <- 14L + nchar(dcall, type = "w") + nchar(sm[1L], type = "w") if (is.na(w)) w <- 14L + nchar(dcall, type = "b") + nchar(sm[1L], type = "b") if (w > LONG) prefix <- paste0(prefix, "\n ") } else prefix <- "Error : " msg <- paste0(prefix, conditionMessage(e), "\n") .Internal(seterrmessage(msg[1L])) if (!silent && isTRUE(getOption("show.error.messages"))) { cat(msg, file = outFile) .Internal(printDeferredWarnings()) } invisible(structure(msg, class = "try-error", condition = e))})
12: try(eval(expr, env), silent = TRUE)
13: serialize(what, NULL, xdr = FALSE)
14: sendMaster(try(eval(expr, env), silent = TRUE), FALSE)
15: mcparallel(FUN(X[[i]], ...), name = names(X)[i], mc.set.seed = mc.set.seed, silent = mc.silent)
16: FUN(X[[i]], ...)
17: lapply(seq_along(X), function(i) mcparallel(FUN(X[[i]], ...), name = names(X)[i], mc.set.seed = mc.set.seed, silent = mc.silent))
18: mclapply(X, FUN, mc.cores = mc.cores, ...)
19: withCallingHandlers({ out <- mclapply(X, FUN, mc.cores = mc.cores, ...)}, warning = function(w) { assign("warn", c(warn, w$message), env) invokeRestart("muffleWarning")})
20: .try_mclapply(isamp, estFun, fargs = fargs, mc.cores = ncores, mc.preschedule = FALSE, pass.errors = !lb.stars)
21: pulsar(data = X, fun = match.fun(estFun), fargs = args, thresh = 0.05, ncores = 24, criterion = "stars", ub.stars = TRUE, lb.stars = TRUE)
22: eval(call, environment())
23: eval(call, environment())
24: spiec.easi.default(data, ...)
25: spiec.easi.list(list(otumat1, otumat2), method = "mb", nlambda = 40, lambda.min.ratio = 0.01, sel.criterion = "bstars", pulsar.params = list(thresh = 0.05, ncores = 24))
26: spiec.easi(list(otumat1, otumat2), method = "mb", nlambda = 40, lambda.min.ratio = 0.01, sel.criterion = "bstars", pulsar.params = list(thresh = 0.05, ncores = 24))
An irrecoverable exception occurred. R is aborting now ...
slurmstepd: error: *** JOB 3832403 ON amr-204 CANCELLED AT 2023-01-30T12:35:20 ***
Here is the script I use :
#!~/miniconda3/envs/R_network/bin/Rscript
library(SpiecEasi)
library(igraph)
setwd("/mnt/research/gilbert_lab/jean-baptiste_wd/G820/Ranalyses")
otumat1 <- read.csv(file="/mnt/research/gilbert_lab/jean-baptiste_wd/G820/Ranalyses/N_B_Network_functions.csv", sep=",", header=T, row.names=1, check.names=F)
otumat2 <- read.csv(file="/mnt/research/gilbert_lab/jean-baptiste_wd/G820/Ranalyses/N_B_organisms.csv", sep=",", header=T, row.names=1, check.names=F)
otumat1 = as.matrix(otumat1)
otumat1 = t(otumat1)
otumat2 = as.matrix(otumat2)
otumat2 = t(otumat2)
## Run SE
se.test <- spiec.easi(list(otumat1,otumat2), method='mb', nlambda=40, lambda.min.ratio=1e-2, sel.criterion='bstars', pulsar.params = list(thresh = 0.05, ncores=24))
## Export to igraph
ig.se <- adj2igraph(getRefit(se.test))
V(ig.se)$name <- c(colnames(otumat1),colnames(otumat2))
sebeta <- as.matrix(symBeta(getOptBeta(se.test), mode='maxabs'))
rownames(sebeta) <- colnames(sebeta) <- V(ig.se)$name
el <- get.edgelist(ig.se) ## Get the unweighted edgelist from the igraph object
sizes <- list(rep(1, length(el[,1]))) ## Create a vector of "1"s that will hold the weights, or default to one
for (i in 1:length(el[,1])){ ## Iterate through the edges indexed in the beta matrix to get the weights
first <- el[,1][i]
second <- el[,2][i]
sizes[i]<-sebeta[first,second]
}
E(ig.se)$weight <- unlist(sizes)
ig.el <- as_edgelist(ig.se,names=TRUE)
ig.el.weight <- cbind(ig.el,round(E(ig.se)$weight,5))
head(ig.el.weight)
write.csv(ig.el.weight,"/mnt/research/gilbert_lab/jean-baptiste_wd/G820/Ranalyses/Crdom_N_B_network.tsv", sep=";")
I really don't know why I have this error, there should be largely enough memory attributed to the computation (96Gb)
Cheers,
At 7000 columns, a dense Precision matrix requires at most (7000^2) * 32 bytes (on average, this is really lower, since we have a sparse matrices for most values of lambda). But at 40*20 matrices for the combination of lambda
and rep.num
, we would need 1.2 TB of RAM in this worse case scenario.
I really can't say how much memory is enough, but if you're on a cluster I would recommend looking into the batch mode pulsar, if you can't filter out more OTUs: https://github.com/zdk123/SpiecEasi#batch-mode - the memory requirements will be ~1/20th of the serialized job.
Ho! Indeed I didn't realize how computing intense that would be. Thank you for the quick reply, it was really helpful.
EDIT: I must admit that I understand very little about how batchtools works and how I can use it with Spiec-easi. Do you have a script example that would make it easier for me to understand ?