gagneurlab/OUTRIDER

OURIDER do not run serially

Closed this issue · 2 comments

I have installed OUTRIDER package(http://bioconductor.org/packages/release/bioc/html/OUTRIDER.html) in Centos7.6 linux system. When I use the core command OUTRIDER(ods) to process my gene expression counts matrix, I noticed a strange issue. The function running in parallel even I explicitly set BPPARAM = SerialParam(). In fact it use all CPU cores always. I attached my script and data.
Do you have any suggestions on making the OURIDER run serially?

############### Rscript
library('OUTRIDER', quietly=TRUE)
library('dplyr', quietly=TRUE)
################# load data
ctsFile <- '/media/eys/xwj/RNAseq/public_normal/df_cts_HC1157_corrupt_fc2_ngene100_nrep3.txt'
ctsTable <- read.table(ctsFile, check.names = FALSE)
ctsTable <- ctsTable[, (ncol(ctsTable)-600+1):ncol(ctsTable)]

ods <- OutriderDataSet(countData=ctsTable)
ods <- filterExpression(ods, minCounts=TRUE, filterGenes=TRUE,)
ods <- estimateSizeFactors(ods)

############### input q
args = commandArgs(trailingOnly=TRUE)
q = as.integer(args[1])
q = 20
print(q)

start <- Sys.time()
ods <- OUTRIDER(ods, q=q, BPPARAM = SerialParam(), iterations=8)

end <- Sys.time()
print(end-start)

#################################
> sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /public/home/test1/soft/anaconda3/envs/R4.1_OUTRIDER/lib/libopenblasp-r0.3.18.so

locale:
[1] LC_CTYPE=zh_CN.UTF-8 LC_NUMERIC=C
[3] LC_TIME=zh_CN.UTF-8 LC_COLLATE=zh_CN.UTF-8
[5] LC_MONETARY=zh_CN.UTF-8 LC_MESSAGES=zh_CN.UTF-8
[7] LC_PAPER=zh_CN.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=zh_CN.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods
[8] base

other attached packages:
[1] dplyr_1.0.7 OUTRIDER_1.12.0
[3] data.table_1.14.2 SummarizedExperiment_1.24.0
[5] MatrixGenerics_1.6.0 matrixStats_0.61.0
[7] GenomicFeatures_1.46.1 AnnotationDbi_1.56.1
[9] Biobase_2.54.0 GenomicRanges_1.46.0
[11] GenomeInfoDb_1.30.0 IRanges_2.28.0
[13] S4Vectors_0.32.0 BiocGenerics_0.40.0
[15] BiocParallel_1.28.0

loaded via a namespace (and not attached):
[1] bitops_1.0-7 bit64_4.0.5 webshot_0.5.2
[4] filelock_1.0.2 RColorBrewer_1.1-2 progress_1.2.2
[7] PRROC_1.3.1 httr_1.4.2 tools_4.1.1
[10] backports_1.3.0 utf8_1.2.2 R6_2.5.1
[13] lazyeval_0.2.2 DBI_1.1.1 colorspace_2.0-2
[16] tidyselect_1.1.1 gridExtra_2.3 prettyunits_1.1.1
[19] DESeq2_1.34.0 bit_4.0.4 curl_4.3.2
[22] compiler_4.1.1 TSP_1.1-11 xml2_1.3.2
[25] plotly_4.10.0 DelayedArray_0.20.0 rtracklayer_1.54.0
[28] scales_1.1.1 checkmate_2.0.0 genefilter_1.76.0
[31] rappdirs_0.3.3 stringr_1.4.0 digest_0.6.28
[34] Rsamtools_2.10.0 XVector_0.34.0 htmltools_0.5.2
[37] pkgconfig_2.0.3 dbplyr_2.1.1 fastmap_1.1.0
[40] htmlwidgets_1.5.4 rlang_0.4.12 RSQLite_2.2.8
[43] BBmisc_1.11 BiocIO_1.4.0 generics_0.1.1
[46] jsonlite_1.7.2 dendextend_1.15.2 RCurl_1.98-1.5
[49] magrittr_2.0.1 GenomeInfoDbData_1.2.7 Matrix_1.3-4
[52] Rcpp_1.0.7 munsell_0.5.0 fansi_0.4.2
[55] viridis_0.6.2 lifecycle_1.0.1 stringi_1.7.5
[58] yaml_2.2.1 zlibbioc_1.40.0 plyr_1.8.6
[61] BiocFileCache_2.2.0 grid_4.1.1 blob_1.2.2
[64] parallel_4.1.1 crayon_1.4.2 lattice_0.20-45
[67] Biostrings_2.62.0 splines_4.1.1 annotate_1.72.0
[70] hms_1.1.1 KEGGREST_1.34.0 locfit_1.5-9.4
[73] pillar_1.6.4 rjson_0.2.20 reshape2_1.4.4
[76] codetools_0.2-18 geneplotter_1.72.0 biomaRt_2.50.0
[79] XML_3.99-0.8 glue_1.4.2 pcaMethods_1.86.0
[82] foreach_1.5.1 png_0.1-7 vctrs_0.3.8
[85] tidyr_1.1.4 gtable_0.3.0 purrr_0.3.4
[88] heatmaply_1.3.0 assertthat_0.2.1 cachem_1.0.6
[91] ggplot2_3.3.5 xtable_1.8-4 restfulr_0.0.13
[94] survival_3.2-13 viridisLite_0.4.0 pheatmap_1.0.12
[97] seriation_1.3.1 tibble_3.1.5 iterators_1.0.13
[100] registry_0.5-1 GenomicAlignments_1.30.0 memoise_2.0.0
[103] ellipsis_0.3.2

Dear @xuwenjian85 thanks for posting the issue here.
If no parallelization is used, data.table is internally parallelizing table operations automatically.

You can see and set how many cores are used by data.table with:

getDTthreads()
setDTthreads(threads)

Let me know if this solved your problem.

I added the setDTthreads line. Still runs in parallel.
Anyway, I find my way around this issue by use "taskset" of shell (https://linuxhint.com/use-taskset-command/):

taskset -c 1,2 myscript.R

Dear @xuwenjian85 thanks for posting the issue here. If no parallelization is used, data.table is internally parallelizing table operations automatically.

You can see and set how many cores are used by data.table with:

getDTthreads()
setDTthreads(threads)

Let me know if this solved your problem.