drisso/zinbwave

Zinbwave parallelization issue

Opened this issue · 11 comments

giovp commented

Hi,

I have been trying to run zinbwave on my dataset and haven't managed to finish it successfully. Specifically, I have not found a way to run it through with parallelization.
Even when subsetting the dataset to get only hundreds on genes/cells (but it is the same with larger subsets), after ~30 minutes the run is still not finished.

> filteredVar_scset_zinb
class: SummarizedExperiment 
dim: 100 150 
metadata(0):
assays(1): counts
rownames(100): HLA-DRA FTL ... PLEK IL2RG
rowData names(0):
colnames(150): ES1160017 ES1160022 ... ES1160312 ES1160316
colData names(9): biosampleID biosampleType ... subjectAge 
> zinb <- zinbwave::zinbwave(filteredVar_scset_zinb, 
+                              K = 2, 
+                              BPPARAM=MulticoreParam(8)
+                              )

When I kill the command, this is what I get

Warning messages:
1: In serialize(data, node$con, xdr = FALSE) :
  'package:stats' may not be available when loading
2: In serialize(data, node$con, xdr = FALSE) :
  'package:stats' may not be available when loading
3: In serialize(data, node$con, xdr = FALSE) :
  'package:stats' may not be available when loading
4: In serialize(data, node$con, xdr = FALSE) :
  'package:stats' may not be available when loading
5: In serialize(data, node$con, xdr = FALSE) :
  'package:stats' may not be available when loading
6: In serialize(data, node$con, xdr = FALSE) :
  'package:stats' may not be available when loading
7: In serialize(data, node$con, xdr = FALSE) :
  'package:stats' may not be available when loading
8: In serialize(data, node$con, xdr = FALSE) :
  'package:stats' may not be available when loading

This error is thrown also with zinbFit.

Instead, if I run:

> zinb <- zinbwave::zinbwave(filteredVar_scset_zinb, 
+                              K = 2, 
+                              BPPARAM=BiocParallel::SerialParam()
+   )
>

It finishes without problems.
I understand it is most likely a problem with the cluster/parallelization, but any clue on where I should start looking?
This also happen if I register cores before:

cores=8
BiocParallel::register(BiocParallel::MulticoreParam(workers=cores))
zinb <- zinbwave::zinbwave(filteredVar_scset_zinb, 
                             K = 2
                             )

Thank you very much for any suggestion,

Giovanni

R version 3.5.0 (2018-04-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server release 6.7 (Santiago)

Matrix products: default
BLAS/LAPACK: /usr/prog/OpenBLAS/0.2.8-gompi-1.5.14-NX-LAPACK-3.5.0/lib/libopenblas_nehalemp-r0.2.8.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] bindrcpp_0.2.2              magrittr_1.5               
 [3] batchtools_0.9.11           data.table_1.11.8          
 [5] forcats_0.2.0               stringr_1.2.0              
 [7] dplyr_0.7.7                 purrr_0.2.5                
 [9] readr_1.1.1                 tidyr_0.8.2                
[11] tibble_1.4.2                ggplot2_2.2.1              
[13] tidyverse_1.2.1             biomaRt_2.38.0             
[15] zinbwave_1.3.4              SingleCellExperiment_1.4.0 
[17] SummarizedExperiment_1.12.0 DelayedArray_0.8.0         
[19] BiocParallel_1.16.0         matrixStats_0.54.0         
[21] Biobase_2.40.0              GenomicRanges_1.34.0       
[23] GenomeInfoDb_1.16.0         IRanges_2.16.0             
[25] S4Vectors_0.20.0            BiocGenerics_0.28.0        

loaded via a namespace (and not attached):
 [1] nlme_3.1-131           bitops_1.0-6           lubridate_1.7.4       
 [4] progress_1.2.0         httr_1.3.1             numDeriv_2016.8-1     
 [7] backports_1.1.2        tools_3.5.0            R6_2.3.0              
[10] DBI_1.0.0              lazyeval_0.2.0         colorspace_1.3-2      
[13] withr_2.1.2            tidyselect_0.2.5       prettyunits_1.0.2     
[16] mnormt_1.5-5           compiler_3.5.0         cli_1.0.1             
[19] glmnet_2.0-16          pspline_1.0-18         rvest_0.3.2           
[22] xml2_1.1.1             checkmate_1.8.5        scales_0.4.1          
[25] mvtnorm_1.0-8          psych_1.6.12           genefilter_1.64.0     
[28] rappdirs_0.3.1         digest_0.6.12          foreign_0.8-67        
[31] XVector_0.22.0         pkgconfig_2.0.2        stabledist_0.7-1      
[34] ADGofTest_0.3          limma_3.38.2           rlang_0.3.0.1         
[37] readxl_1.1.0           rstudioapi_0.8         RSQLite_1.1-2         
[40] bindr_0.1.1            jsonlite_1.5           RCurl_1.95-4.8        
[43] GenomeInfoDbData_1.1.0 Matrix_1.2-8           Rcpp_1.0.0            
[46] munsell_0.4.3          stringi_1.1.3          edgeR_3.24.0          
[49] zlibbioc_1.26.0        plyr_1.8.4             grid_3.5.0            
[52] crayon_1.3.4           lattice_0.20-34        haven_1.1.2           
[55] splines_3.5.0          annotate_1.60.0        hms_0.3               
[58] locfit_1.5-9.1         pillar_1.3.0           base64url_1.4         
[61] softImpute_1.4         reshape2_1.4.2         codetools_0.2-15      
[64] XML_3.98-1.5           glue_1.3.0             modelr_0.1.2          
[67] foreach_1.4.3          cellranger_1.1.0       gtable_0.2.0          
[70] assertthat_0.2.0       xtable_1.8-2           broom_0.4.2           
[73] survival_2.41-3        pcaPP_1.9-73           gsl_1.9-10.3          
[76] copula_0.999-18        iterators_1.0.8        AnnotationDbi_1.44.0  
[79] memoise_1.0.0          brew_1.0-6        

Hi,
I have the same problem with parallelization

I am trying to run zinbwave on a matrix of 2000 genes * 50K cells on a SGE cluster, testing different config (number of CPUs and workers used in the MulticoreParam) but it's taking more than 30 hours with no useful info on progress. ( I guess I should've used verbose=T)

i'm trying to monitor the memory consumption by logging on the computing node and doing a simple top but I'm not sure this is telling me what is actually going on.

lineprof(zinb <- zinbFit(se.vars, K=10, epsilon=10000, X="~Var1+Grp+Gen", BPPARAM=BiocParallel::MulticoreParam( N ))) -> proof

the size of se.vars (SummarizedExpreriment) is 2000*50K , 1.6GB. I also tried reducing the number of genes to 1000, so the object is 806 Mb

CPU BPParam (N) maxMem objsize (se.var)
10 6 48.7Gb 1.6Gb
8 2 48.7Gb 1.6Gb
8 6 43.5Gb 805Mb
24 10 48.7Gb 1.6Gb
16 1 75.8Gb (tops at 81Gb) 805Mb

I had previously managed to analyze a 1000* 18K matrix in 6 hours with no parallelization and 220 Gb of memory on one node, and I can run the fluidigm data provided as example on my macbook with MulticoreParam(2) in less than 5 min.

  • is the runtime somehow expected and why?
  • Is there any error in my command?
  • What is the actual memory consumption of the fitting - do you have any other case scenario?

thanks

zinbwave uses BiocParallel for parallel computations and BiocParallel has some issues with MulticoreParam, I found that using DoparParam() instead improves things greatly.

Hi Davide,

In tradeSeq we're also using BiocParallel, e.g. https://github.com/statOmics/tradeSeq/blob/master/R/fitGAM.R#L227-L229.

Sometimes, it seems like the parallelization gets stuck at a particular point and you could be waiting forever for the output. Is this also what you are experiencing with zinbwave?
It seems like the users above also converge without parallelization while no output is produced if parallelization is used.

Did you experience this issue to get better when switching to DoparParam or were you thinking about other issues in your reply above?

Thanks!

FYI, parallelization via forked processing, which BiocParallel::MulticoreParam, parallel::mclapply(), future::plan("multicore"), doMC::registerDoMC(), ... use, is known to be unstable in various contexts. Specifically, RStudio advice against using multicore processing when running R via the RStudio environment (rstudio/rstudio#2597 (comment)).

You might have better success if you R in the terminal. However, the author of mclapply() wrote the following in R-devel thread 'mclapply returns NULLs on MacOS when running GAM' (https://stat.ethz.ch/pipermail/r-devel/2020-April/079384.html) on 2020-04-28:

Do NOT use mcparallel() in packages except as a non-default option that user can set for the reasons Henrik explained. Multicore is intended for HPC applications that need to use many cores for computing-heavy jobs, but it does not play well with RStudio and more importantly you don't know the resource available so only the user can tell you when it's safe to use. Multi-core machines are often shared so using all detected cores is a very bad idea. The user should be able to explicitly enable it, but it should not be enabled by default.

In summary, if you're not in full control of and understand everything(*) called while running forked processes, it is not safe to used forked parallel processing. (*) == the script that calls zinbwave and in turn all functions and packages dependencies that zinbwave rely on.

I'd recommend using a "snow" style cluster, which parallelizes using independent background R session.

Thanks @HenrikBengtsson for the comment.

Note that zinbwave is kind of agnostic to the parallelization back-end and relies completely on BiocParallel.

All the zinbwave functions that support parallelization default to BPPARAM=BiocParallel::bpparam(), trusting the defaults of BiocParallel. As far as I can tell from the documentation of BiocParallel, this is the recommended way, but unfortunately it does default to MulticoreParam.

I can obviously change the default to BPPARAM=BiocParallel::SerialParam() in zinbwave to use one core by default, but I was wondering if it makes more sense for the default of BiocParallel bpparam() to be SerialParam. Any thoughts @mtmorgan?

Henrik's suggestion is to use SnowParam(), not SerialParam(). I think its probably a good idea in package code, and I'll implement this.

For your own code, I think you should include

if (!bpisup(bpparm)) {
    bpstart(bpparam)
   on.exit(bpstop(bpparam))
}

at the start of each user-facing function that uses bpparam. This will start a single SNOW cluster for the duration of the caculation.

I also encourage you to look more carefully at how bplapply is used in your packages, and to pass minimal information to the parallel processes. Kind of analogous to

bad = function(df) {
    for (i in seq_len(nrow(df) - 1))
        df$col[i] = df$col[i + 1]
    df
}

which copies the entire dataframe on each row and requires bad() to know about the structure of df, versus

good = function(x) {
    for (i in head(seq_along(x), -1))
        x[i] = x[i + 1]
    x
}
df$col <- good(df$col)

with the resulting performance improvement

> df <- do.call(rbind, replicate(20, mtcars, simplify=FALSE))
> dim(df)
[1] 640  11
> library(microbenchmark)
> microbenchmark(bad(df), good(df$cyl))
Unit: microseconds
         expr      min       lq     mean    median        uq       max neval cld
      bad(df) 6736.519 7000.092 7816.577 7321.7570 8725.2040 12270.674   100   b
 good(df$cyl)  106.282  109.853  118.361  112.5225  120.6215   214.132   100  a

Sorry for not being clear. I "stumbled upon" this issue and saw that forked/multicore processing was used in combination of RStudio, so post my comment as an FYI regarding multicore. I wasn't aware of the default settings.

Hi @HenrikBengtsson it is indeed useful to know this bad behavior of the RStudio/multicore combination.

@mtmorgan I suspect that some of my code is in the bad() form, but I will have to look more carefully. My comment on making SerialParam() the default in zinbwave (not BiocParallel) was related to Henrik's comment on not letting users use all the cores by default.

For what it's worth the default is not to use all cores

> parallel::detectCores()
[1] 12
> BiocParallel::multicoreWorkers()
[1] 10

I guess the rationale for a relatively large number (rather than, e.g., 2, which would be appropriate for a build machine) is that a typical R analysis is interactive and the user is mostly focusing on R. Also desktops are good at multitasking these days and I doubt that the user notices over-subscription (other than their fan turning on!) so long as memory use remains resonable.

FWIW2, scientific software tools that use multiple cores by default often wreak havoc on multi-tenant compute environments. I think there're quite a few Bioconductor packages running on such environments. We see this on a regular basis on our UCSF clusters (one with 100 and another with 950 users). This results in sysadmins having to spend lots of time troubleshooting, narrowing in on exactly what software causes the problem, then have to write emails to each user, who is often unaware of the problem. The number one cause of CPU oversubscription (and indirectly RAM oversubscription) is this type of software. It's much rarer to see this when the user has to explicitly enable parallel processing and/or specify the number of parallel workers to use.

Until we've settled on a common and robust standard for doing this in R, I don't think there is another solution to this problem that having all R packages running sequentially by default and asking the end-user to explicit enable parallel processing. (I have plenty more thoughts on this, but maybe that should be moved to another issue).

One thing that tripped me up is even with SerialParam() passed to zinbwave or registered prior is that zinbwave still uses multiple cores when running. This is because as usual with R and python numerical libraries they will by default use multiple cores unless you have set the following environment variables below. I didn't check which specific env var below turned off parallelism in zinbwave but one of them did and then it ran using one core with SerialParam().

export BLIS_NUM_THREADS=1
export MKL_NUM_THREADS=1
export NUMBA_NUM_THREADS=1
export NUMEXPR_NUM_THREADS=1
export OMP_NUM_THREADS=1
export OPENBLAS_NUM_THREADS=1
export VECLIB_MAXIMUM_THREADS=1