cole-trapnell-lab/garnett

Cannot find more nearest neighbours than there are points

mahermassoud opened this issue · 1 comments

When I run train_cell_classifier() on my dataset with my marker file, I get the following output.

> v3.classifier <- train_cell_classifier(cds=qc.hp.mono,
+                                        marker_file=MARKER.FP,
+                                        db=org.Hs.eg.db,
+                                        cds_gene_id_type="ENSEMBL",
+                                        num_unknown=200,
+                                        marker_file_gene_id_type="SYMBOL",
+                                        cores=4)
There are 21 cell type definitions
training_sample
                      B cell                        CD34+ 
                         500                           77 
              Dendritic cell                  Endothelium 
                         287                           19 
  Fibroblasts/myofibroblasts           Gastric epithelium 
                         454                          500 
   Gastric epithelium Antrum Gastric neuroendocrine cells 
                           5                          115 
           Gastric stem cell                   Macrophage 
                           3                           93 
                    Monocyte                      NK cell 
                         390                          106 
                      T cell                      Unknown 
                         135                          200 
The following cell types do not have enough training cells and will be dropped:  Gastric epithelium Antrum Gastric stem cell
Loaded glmnet 4.0-2
Model training finished.
Warning in (function (A, nv = 5, nu = nv, maxit = 1000, work = nv + 7, reorth = TRUE,  :
  You're computing too large a percentage of total singular values, use a standard svd instead.
Error in RANN::nn2(pcs, pcs, k + 1, searchtype = "standard") : 
  Cannot find more nearest neighbours than there are points
In addition: Warning messages:
1: In make_name_map(parse_list, as.character(row.names(fData(norm_cds))),  :
  10 genes could not be converted from SYMBOL to ENSEMBL These genes are listed below: 
2: In make_name_map(parse_list, as.character(row.names(fData(norm_cds))),  :
  The following genes from the cell type definitionfile are not present in the cell dataset.  Pleasecheck these genes for errors. Cell typedetermination will continue, ignoring these genes.
PECAM1
CD3
CD11B
CD141
CD25
CD49D
TROY
CD16
CD15
CD25
GIF
3: In train_cell_classifier(cds = qc.hp.mono, marker_file = MARKER.FP,  :
  Cell type Neutrophil has no genes that are expressed and will be skipped
4: In train_cell_classifier(cds = qc.hp.mono, marker_file = MARKER.FP,  :
  Cell type Gastric parietal cells has no genes that are expressed and will be skipped

Fixed by changing num_unknown=500 to num_unknown=1000