Freezing on DEGanalysis
wleoncio opened this issue · 1 comments
wleoncio commented
Describe the bug
There are some circumstances that cause the DEGanalysis
function to hang.
To Reproduce
- Download the pan_indrop_matrix_8000cells_18556genes dataset
- From the download directory, run R
- Run the following code in R (it may take a few minutes):
library(DIscBIO)
load("data/pan_indrop_matrix_8000cells_18556genes.rda")
# ==============================================================================
# Determining contants
# ==============================================================================
n_genes <- 500
K <- 3
# ==============================================================================
# Subsetting and formatting datasets
# ==============================================================================
sc_dataframe <- pan_indrop_matrix_8000cells_18556genes[, seq_len(n_genes)]
sc <- DISCBIO(sc_dataframe)
# ==============================================================================
# Performing operations
# ==============================================================================
MIinExp <- mean(rowMeans(sc_dataframe, na.rm=TRUE))
MinNumber <- round(length(sc_dataframe[1, ]) / 10)
sc <- Normalizedata(
sc, mintotal=1000, minexpr=MIinExp, minnumber=MinNumber, maxexpr=Inf,
downsample=FALSE, dsn=1, rseed=17000
)
sc <- FinalPreprocessing(sc, GeneFlitering="ExpF", export=FALSE, quiet=TRUE)
sc <- Clustexp(sc, cln=K, quiet=TRUE)
sc <- comptSNE(sc, rseed=15555, quiet=TRUE)
# This is the part that freezes
cdiff <- DEGanalysis(
sc, Clustering="K-means", K=K, fdr=0.10, name="all_clusters",
export=FALSE, quiet=FALSE, plot=FALSE, nresamp=5, nperms=10
)
- Observe that the code freezes with the output of
DEGanalysis
being:
The dataset is ready for differential expression analysis[1] "Cl2" "Cl1" "Cl3"
Number of comparisons: 6
Estimating sequencing depths...
Resampling to get new data matrices...
perm= 1
perm= 2
perm= 3
perm= 4
perm= 5
perm= 6
perm= 7
perm= 8
perm= 9
perm= 10
Number of thresholds chosen (all possible thresholds) = 1283
Getting all the cutoffs for the thresholds...
Getting number of false positives in the permutation...
'select()' returned 1:many mapping between keys and columns
Low-regulated genes in the Cl1 in Cl2 VS Cl1
'select()' returned 1:many mapping between keys and columns
Up-regulated genes in the Cl1 in Cl2 VS Cl1
Estimating sequencing depths...
Resampling to get new data matrices...
Ctrl+C doesn't quit the function, only killing R does the trick.
Expected behavior
For a simpler set of parameters, for example n_genes <- 100
and K <- 2
, the code above ends with the following output:
Up-regulated genes in the Cl1 in Cl2 VS Cl1
Comparisons Target cluster Gene number File name Gene number File name
1 Cl2 VS Cl1 Cl1 477 Up-regulated-all_clustersCl1inCl2VSCl1.csv 941 Low-regulated-all_clustersCl1inCl2VSCl1.csv
2 Cl2 VS Cl1 Cl2 477 Low-regulated-all_clustersCl2inCl2VSCl1.csv 941 Up-regulated-all_clustersCl2inCl2VSCl1.csv
Moreover, the structure of cdiff
is:
List of 2
$ : chr [1:1422, 1:2] "ENSG00000005022" "ENSG00000006327" "ENSG00000008394" "ENSG00000008517" ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:2] "DEGsE" "DEGsS"
$ :'data.frame': 2 obs. of 6 variables:
..$ Comparisons : chr [1:2] "Cl2 VS Cl1" "Cl2 VS Cl1"
..$ Target cluster: chr [1:2] "Cl1" "Cl2"
..$ Gene number : int [1:2] 477 477
..$ File name : chr [1:2] "Up-regulated-all_clustersCl1inCl2VSCl1.csv" "Low-regulated-all_clustersCl2inCl2VSCl1.csv"
..$ Gene number : int [1:2] 941 941
..$ File name : chr [1:2] "Low-regulated-all_clustersCl1inCl2VSCl1.csv" "Up-regulated-all_clustersCl2inCl2VSCl1.csv"
Software metainformation
- Operating System: Ubuntu 18.04
- R version: 4.0.0
- DIscBIO version or commit number: 0.99.6