caught segfault error on cluster

Question

caught segfault error on cluster

GossypiumH opened this issue 2 years ago · 2 comments

Hello,

I am trying to do a cross domain analysis on a relatively big microbial dataset.

On one hand I have a bacterial abundance matrix with 7000+ columns (aka species) and 38 samples

And on the other I have a much smaller matrixes with the count of general functions of genes expressed by the bacterial community with ~150 columns and the same amount of samples (for example acidic stress)

I used SPIEC-EASI in my previous studies with amplicon data (much smaller datasets) and I could run those on a laptop without any problem) However I had to use a cluster for this dataset.

However the cluster tell me this error after executing spiec.easi() in my script :

JobID: 3832403
Running on node: amr-204
/mnt/research/gilbert_lab/jean-baptiste_wd/G820/Ranalyses
/mnt/home/flochjea/miniconda3/envs/R_network/bin/R

Attaching package: ‘igraph’

The following object is masked from ‘package:SpiecEasi’:

   make_graph

The following objects are masked from ‘package:stats’:

   decompose, spectrum

The following object is masked from ‘package:base’:

   union

Applying data transformations...
Selecting model with pulsar using bstars...
*** caught segfault ***
address 0x2b34e98f5000, cause 'memory not mapped'
5: do.call(fun, c(fargs, list(data[ind.sample, ])))
6: FUN(X[[i]], ...)
7: eval(expr, env)
8: doTryCatch(return(expr), name, parentenv, handler)
9: tryCatchOne(expr, names, parentenv, handlers[[1L]])
10: tryCatchList(expr, classes, parentenv, handlers)
11: tryCatch(expr, error = function(e) {    call <- conditionCall(e)    if (!is.null(call)) {        if (identical(call[[1L]], quote(doTryCatch)))             call <- sys.call(-4L)        dcall <- deparse(call, nlines = 1L)        prefix <- paste("Error in", dcall, ": ")        LONG <- 75L        sm <- strsplit(conditionMessage(e), "\n")[[1L]]        w <- 14L + nchar(dcall, type = "w") + nchar(sm[1L], type = "w")        if (is.na(w))             w <- 14L + nchar(dcall, type = "b") + nchar(sm[1L],                 type = "b")        if (w > LONG)             prefix <- paste0(prefix, "\n  ")    }    else prefix <- "Error : "    msg <- paste0(prefix, conditionMessage(e), "\n")    .Internal(seterrmessage(msg[1L]))    if (!silent && isTRUE(getOption("show.error.messages"))) {        cat(msg, file = outFile)        .Internal(printDeferredWarnings())    }    invisible(structure(msg, class = "try-error", condition = e))})
12: try(eval(expr, env), silent = TRUE)
13: serialize(what, NULL, xdr = FALSE)
14: sendMaster(try(eval(expr, env), silent = TRUE), FALSE)
15: mcparallel(FUN(X[[i]], ...), name = names(X)[i], mc.set.seed = mc.set.seed,     silent = mc.silent)
16: FUN(X[[i]], ...)
17: lapply(seq_along(X), function(i) mcparallel(FUN(X[[i]], ...),     name = names(X)[i], mc.set.seed = mc.set.seed, silent = mc.silent))
18: mclapply(X, FUN, mc.cores = mc.cores, ...)
19: withCallingHandlers({    out <- mclapply(X, FUN, mc.cores = mc.cores, ...)}, warning = function(w) {    assign("warn", c(warn, w$message), env)    invokeRestart("muffleWarning")})
20: .try_mclapply(isamp, estFun, fargs = fargs, mc.cores = ncores,     mc.preschedule = FALSE, pass.errors = !lb.stars)
21: pulsar(data = X, fun = match.fun(estFun), fargs = args, thresh = 0.05,     ncores = 24, criterion = "stars", ub.stars = TRUE, lb.stars = TRUE)
22: eval(call, environment())
23: eval(call, environment())
24: spiec.easi.default(data, ...)
25: spiec.easi.list(list(otumat1, otumat2), method = "mb", nlambda = 40,     lambda.min.ratio = 0.01, sel.criterion = "bstars", pulsar.params = list(thresh = 0.05,         ncores = 24))
26: spiec.easi(list(otumat1, otumat2), method = "mb", nlambda = 40,     lambda.min.ratio = 0.01, sel.criterion = "bstars", pulsar.params = list(thresh = 0.05,         ncores = 24))
An irrecoverable exception occurred. R is aborting now ...
slurmstepd: error: *** JOB 3832403 ON amr-204 CANCELLED AT 2023-01-30T12:35:20 ***

Here is the script I use :

#!~/miniconda3/envs/R_network/bin/Rscript

library(SpiecEasi)
library(igraph)

setwd("/mnt/research/gilbert_lab/jean-baptiste_wd/G820/Ranalyses")

otumat1 <- read.csv(file="/mnt/research/gilbert_lab/jean-baptiste_wd/G820/Ranalyses/N_B_Network_functions.csv", sep=",", header=T, row.names=1, check.names=F)

otumat2 <- read.csv(file="/mnt/research/gilbert_lab/jean-baptiste_wd/G820/Ranalyses/N_B_organisms.csv", sep=",", header=T, row.names=1, check.names=F)

otumat1 = as.matrix(otumat1)
otumat1 = t(otumat1)

otumat2 = as.matrix(otumat2)
otumat2 = t(otumat2)

## Run SE
se.test <- spiec.easi(list(otumat1,otumat2), method='mb', nlambda=40, lambda.min.ratio=1e-2, sel.criterion='bstars', pulsar.params = list(thresh = 0.05, ncores=24))

## Export to igraph
ig.se <- adj2igraph(getRefit(se.test))

V(ig.se)$name <- c(colnames(otumat1),colnames(otumat2))

sebeta <- as.matrix(symBeta(getOptBeta(se.test), mode='maxabs'))

rownames(sebeta) <- colnames(sebeta) <- V(ig.se)$name

el <- get.edgelist(ig.se) ## Get the unweighted edgelist from the igraph object
sizes <- list(rep(1, length(el[,1]))) ## Create a vector of "1"s that will hold the weights, or default to one
for (i in 1:length(el[,1])){ ## Iterate through the edges indexed in the beta matrix to get the weights
  first <- el[,1][i]
  second <- el[,2][i]
  sizes[i]<-sebeta[first,second]
}

E(ig.se)$weight <- unlist(sizes)

ig.el <- as_edgelist(ig.se,names=TRUE)

ig.el.weight <- cbind(ig.el,round(E(ig.se)$weight,5))

head(ig.el.weight)

write.csv(ig.el.weight,"/mnt/research/gilbert_lab/jean-baptiste_wd/G820/Ranalyses/Crdom_N_B_network.tsv", sep=";")

I really don't know why I have this error, there should be largely enough memory attributed to the computation (96Gb)

Cheers,

Answer 1 · 2023-01-31T02:58:07.000Z

At 7000 columns, a dense Precision matrix requires at most (7000^2) * 32 bytes (on average, this is really lower, since we have a sparse matrices for most values of lambda). But at 40*20 matrices for the combination of lambda and rep.num, we would need 1.2 TB of RAM in this worse case scenario.

I really can't say how much memory is enough, but if you're on a cluster I would recommend looking into the batch mode pulsar, if you can't filter out more OTUs: https://github.com/zdk123/SpiecEasi#batch-mode - the memory requirements will be ~1/20th of the serialized job.

Answer 2 · 2023-01-31T14:36:19.000Z

Ho! Indeed I didn't realize how computing intense that would be. Thank you for the quick reply, it was really helpful.

EDIT: I must admit that I understand very little about how batchtools works and how I can use it with Spiec-easi. Do you have a script example that would make it easier for me to understand ?