wmacnair/SampleQC

Error for certain inputs in `calculate_sample_to_sample_MMDs`

Closed this issue · 1 comments

Hi Will

Thought was trying your package and it looks very promising. Unfortunately I encountered two small issues:

calculate_sample_to_sample_MMDs fails when less than 6 datasets are present. This seems to be the case because .make_mmd_graph has a hardcoded number of neighbors (n_nhbrs = 5) which introduces NA's when less then this number of datasets are available which in turn gives an error in igraph::graph_from_edgelist.

When sample_id is in annots_disc I get the following error:
Error: names of annot_disc in colData(qc_obj) should match metadata(qc_obj)$annots$disc
if renaming to patient_id instead it works fine. Also:

> colnames(colData(qc_obj)$annot_disc[[1]])
[1] "group_id"    "sample_id.1" "condition"   "N_cat"       "mito_cat"    "counts_cat" 
> metadata(qc_obj)$annots$disc
[1] "group_id"   "sample_id"  "condition"  "N_cat"      "mito_cat"   "counts_cat"

So I assume somewhere in the code a duplicate column of sample_id is created.

Best,
Reto

Hi @retogerber

Thanks for reporting this.

The use case I've had in mind for SampleQC has generally been for large, complex experiments, so at present it doesn't work so well for smaller datasets (and in your case, it doesn't work at all). It still makes sense to be able to run the fitting part of SampleQC on smaller datasets, so I need to make some tweaks to allow that to work (most likely I will just remove the graph construction and clustering step and replace them with defaults).

The make_qc_dt function also needs to be made more robust... Thanks for adding a useful example of where it can fall over ;)

Cheers
Will