greenelab/miQC

`filterCells` fails with `enforce_left_cutoff = TRUE` when all cells are below threshold

Closed this issue · 1 comments

When using miQC::filterCells(), we ran into an edge case that caused an error when running. It is a bit of a strange case, where the error probably found something that we would have missed had the error not occurred, so I am kind of glad for the error, but also wonder if it might come up in other circumstances.

The code comes from this code chunk:

miQC/R/filterCells.R

Lines 86 to 92 in 0eab0b6

if (enforce_left_cutoff == TRUE) {
min_discard <- min(metrics[!metrics$keep, ]$subsets_mito_percent)
min_index <- which(metrics$subsets_mito_percent == min_discard)[1]
lib_complexity <- metrics[min_index, ]$detected
metrics[metrics$detected <= lib_complexity &
metrics$subsets_mito_percent >= min_discard, ]$keep <- FALSE
}

If posterior_cutoff is large, it is possible that metrics$keep is always TRUE, which results in min_discard getting set to Inf, and more importantly, the final line of the chunk failing with the following error:

Error in `$<-.data.frame`(`*tmp*`, keep, value = FALSE) : 
  replacement has 1 row, data has 0
Calls: <Anonymous> ... is.data.frame -> <Anonymous> -> $<- -> $<-.data.frame 

It would be nice if this error were a bit friendlier, which is my main suggestion at this point.

It might make sense to handle this error more completely, as the net cause is that there is no left cutoff to enforce when no cells are being filtered. So skipping the changing of some cells $keep values should not be a problem.

However, as I said at the start, this error helped catch an odd case where the reason all cells were passing the threshold was because of a stochastic model failure. Specifically we had a data set where the model fit looked like this:

fail_fit

While it may not be obvious, flexmix did fit two components here; they are just extremely similar. The result is that all cells have prob_compromised values of ~0.5.

Refitting with a different seed gave us a much more reasonable fit:
pass_fit

If you had a suggestion of how to catch such a model failure before filtering (aside from manual inspection), that might be helpful. We usually check that the model has two components, but that doesn't catch when the two components are just extremely similar!

Hi Joshua,

Great point. I've encountered the same thing, where sometimes the model will settle on two extremely similar distributions but if you rerun with a different random seed it's fine. I'm not sure of a good way to prevent this from happening so I'll make the error messages more informative. I'll also make a note about this in the vignette.