immunogenomics/harmony

Simple harmony call fails on v1.1.0

const-ae opened this issue · 5 comments

Hi,

I noticed that the new release of harmony fails for small matrix inputs. The following used to work on version 1.0.3, but now throws an error in harmonyObj$cluster_cpp().

Y <- matrix(rnorm(10 * 2), nrow = 10, ncol = 2)
groups <-  rep(c("a", "b"), length.out = nrow(Y))
int <- harmony::RunHarmony(Y, groups, nclust = 2, verbose = FALSE)
#> Transposing data matrix
#> Error in eval(expr, envir, enclos): Mat::submat(): indices out of bounds or incorrectly used

Created on 2023-10-25 with reprex v2.0.2

Note that the following still works:

Y <- matrix(rnorm(100 * 2), nrow = 100, ncol = 2)
groups <-  rep(c("a", "b"), length.out = nrow(Y))
int <- harmony::RunHarmony(Y, groups, nclust = 2, verbose = FALSE)
#> Transposing data matrix

Created on 2023-10-25 with reprex v2.0.2


Also note that the latest version still ignores the verbose = FALSE here and prints "Transposing data matrix".

Best,
Constantin

Hi @const-ae ,

Looks like an integer underflow due to the block_size parameter. We will address this and test with your snippet, though many use cases would involve more than 20 cells.

As far as the Transposing data matrix, we will address that also, it is just we did not have a unit test that caught this scenario.

Thank you for your rigorous testing!

xxiann commented

Hi

A similar error came up for me too, not sure if the error message can assist with debugging this issue.

Error: Mat::submat(): indices out of bounds or incorrectly used
9. stop(structure(list(message = "Mat::submat(): indices out of bounds or incorrectly used",
call = NULL, cppstack = NULL), class = c("std::out_of_range",
"C++Error", "error", "condition")))
8. harmonyObj$cluster_cpp()
7. harmonize(harmonyObj, max.iter.harmony, verbose)
6. tryCatchList(expr, classes, parentenv, handlers)
5. tryCatch({
check_legacy_args(...)
if (set.cores) {
prev.ncores.blas <- RhpcBLASctl::blas_get_num_procs() ...
4. RunHarmony.default(data_mat = embedding[, dims.use], meta_data = metavars_df,
vars_use = group.by.vars, return_object = FALSE, ...)
3. RunHarmony(data_mat = embedding[, dims.use], meta_data = metavars_df,
vars_use = group.by.vars, return_object = FALSE, ...)
2. RunHarmony.Seurat(oshu, verbose = F, group.by.vars = c("cohort"),
reduction = "pca", assay.use = "RNA", reduction.save = "harmony")
1. RunHarmony(oshu, verbose = F, group.by.vars = c("cohort"), reduction = "pca",
assay.use = "RNA", reduction.save = "harmony")

Can you give us a bit of information for the dataset? How big it is?

xxiann commented

The dataset is 341 samples with 21784 genes, initialised with the following

NFEATURES = 3000
rna.seurat <- CreateSeuratObject(counts = raw_counts, project = "Sample_ID", min.cells = 3, min.features = 200, meta.data = sampleinfo2, assay="RNA")

rna.seurat <- NormalizeData(rna.seurat)
rna.seurat <- FindVariableFeatures(rna.seurat, selection.method = "vst", mean.cutoff = c(0.2, 10), dispersion.cutoff = c(1, 40), nfeatures = NFEATURES)

all.genes <- rownames(rna.seurat)
rna.seurat <- ScaleData(rna.seurat, features = all.genes)

n=30
rna.seurat <- RunPCA(rna.seurat, features = VariableFeatures(object = rna.seurat),
               assay = "RNA",
               npcs = n, verbose = F)

rna.seurat <- RunHarmony(rna.seurat, verbose = F,
                   group.by.vars = c("cohort"), 
                   reduction = "pca", assay.use = "RNA", reduction.save = "harmony")

Thank you both for pointing to this bugs.

I would suggest to try out the latest version of the package from github.

Let us know here if there are more issues!