cole-trapnell-lab/garnett

Getting problems with marker free classifier for garnett.

smk5g5 opened this issue · 4 comments

Hi I am getting the following error when trying to use marker free classifier with garnett and I am getting the following error while running train_cell_classifier function.

Error: attempt to use zero-length variable name Execution halted

Following is the code that I used. I was wondering if I am missing something that is not already there in the manual. I would appreciate if you can guide me through any mistake I am making.

Thanks!

`
library(SingleCellExperiment)
library(org.Hs.eg.db)
library(data.table)
library(stringr)
library(Seurat)
library(garnett)

cds <- readRDS('/scratch1/fs1/allegra.petti/khan.saad/Cellassignment_methods/hca_monocle3.rds')
cds <- preprocess_cds(cds, num_dim = 100)

cds <- reduce_dimension(cds)
cds <- reduce_dimension(cds, reduction_method="tSNE")
cds = cluster_cells(cds, resolution=1e-5)

marker_file_path = '/scratch1/fs1/allegra.petti/khan.saad/Cellassignment_methods/markerfree_gar
nettHCA.txt'
pData(cds)$known_type <- pData(cds)$Cell_types

pbmc_classifier <- train_cell_classifier(cds = cds,
marker_file = marker_file_path,
db=org.Hs.eg.db,
cds_gene_id_type = "SYMBOL",
num_unknown = 50,
marker_file_gene_id_type = "SYMBOL")
`
This is how my cds object looks like.

`
class: cell_data_set
dim: 22841 101935
metadata(2): cds_version citations
assays(1): counts
rownames(22841): TSPAN6 TNMD ... CENPVL2 MGC4859
rowData names(1): gene_short_name
colnames(101935): MantonBM7_HiSeq_3-AATCGGTGTAACGCGA-1 MantonBM8_HiSeq_6-AGGCCGTCACATCTTT-1 ...
MantonBM7_HiSeq_3-ATGCGATGTCATTAGC-1 MantonBM5_HiSeq_3-TAGCCGGGTTCCACTC-1
colData names(4): Cellnames Cell_types Size_Factor known_type
reducedDimNames(3): PCA UMAP tSNE
spikeNames(0):
altExpNames(0):

`

Can you post the output of traceback() after the error occurs, and also post a few examples of what's in your marker file?

Can you post the output of traceback() after the error occurs, and also post a few examples of what's in your marker file?

This is how my marker file (few lines) looks like.
`

MantonBM7_HiSeq_3-AATCGGTGTAACGCGA-1
known_type: CD34+ pre-B

MantonBM8_HiSeq_6-AGGCCGTCACATCTTT-1
known_type: CD34+ pre-B

MantonBM1_HiSeq_2-TACAGTGCACCAACCG-1
known_type: CD34+ pre-B

MantonBM5_HiSeq_6-TGCTGCTCAGCCTTGG-1
known_type: Plasma Cell

MantonBM7_HiSeq_3-ATGCGATGTCATTAGC-1
known_type: Plasma Cell
`

Ok, so the format for marker free is to have one definition for each cell type rather than for each cell. So you would want:

>CD34+ pre-B
known_type: CD34+ pre-B

>Plasma Cell
known_type: Plasma Cell

Then be sure that the pData column named "known_type" has labelled some cells as "Plasma Cell" and some as "CD34_ pre-B.

Let me know if this solves it

Closing now, let me know if this continues to be a problem.