Getting problems with marker free classifier for garnett.
smk5g5 opened this issue · 4 comments
Hi I am getting the following error when trying to use marker free classifier with garnett and I am getting the following error while running train_cell_classifier
function.
Error: attempt to use zero-length variable name Execution halted
Following is the code that I used. I was wondering if I am missing something that is not already there in the manual. I would appreciate if you can guide me through any mistake I am making.
Thanks!
`
library(SingleCellExperiment)
library(org.Hs.eg.db)
library(data.table)
library(stringr)
library(Seurat)
library(garnett)
cds <- readRDS('/scratch1/fs1/allegra.petti/khan.saad/Cellassignment_methods/hca_monocle3.rds')
cds <- preprocess_cds(cds, num_dim = 100)
cds <- reduce_dimension(cds)
cds <- reduce_dimension(cds, reduction_method="tSNE")
cds = cluster_cells(cds, resolution=1e-5)
marker_file_path = '/scratch1/fs1/allegra.petti/khan.saad/Cellassignment_methods/markerfree_gar
nettHCA.txt'
pData(cds)$known_type <- pData(cds)$Cell_types
pbmc_classifier <- train_cell_classifier(cds = cds,
marker_file = marker_file_path,
db=org.Hs.eg.db,
cds_gene_id_type = "SYMBOL",
num_unknown = 50,
marker_file_gene_id_type = "SYMBOL")
`
This is how my cds object looks like.
`
class: cell_data_set
dim: 22841 101935
metadata(2): cds_version citations
assays(1): counts
rownames(22841): TSPAN6 TNMD ... CENPVL2 MGC4859
rowData names(1): gene_short_name
colnames(101935): MantonBM7_HiSeq_3-AATCGGTGTAACGCGA-1 MantonBM8_HiSeq_6-AGGCCGTCACATCTTT-1 ...
MantonBM7_HiSeq_3-ATGCGATGTCATTAGC-1 MantonBM5_HiSeq_3-TAGCCGGGTTCCACTC-1
colData names(4): Cellnames Cell_types Size_Factor known_type
reducedDimNames(3): PCA UMAP tSNE
spikeNames(0):
altExpNames(0):
`
Can you post the output of traceback() after the error occurs, and also post a few examples of what's in your marker file?
Can you post the output of traceback() after the error occurs, and also post a few examples of what's in your marker file?
This is how my marker file (few lines) looks like.
`
MantonBM7_HiSeq_3-AATCGGTGTAACGCGA-1
known_type: CD34+ pre-B
MantonBM8_HiSeq_6-AGGCCGTCACATCTTT-1
known_type: CD34+ pre-B
MantonBM1_HiSeq_2-TACAGTGCACCAACCG-1
known_type: CD34+ pre-B
MantonBM5_HiSeq_6-TGCTGCTCAGCCTTGG-1
known_type: Plasma Cell
MantonBM7_HiSeq_3-ATGCGATGTCATTAGC-1
known_type: Plasma Cell
`
Ok, so the format for marker free is to have one definition for each cell type rather than for each cell. So you would want:
>CD34+ pre-B
known_type: CD34+ pre-B
>Plasma Cell
known_type: Plasma Cell
Then be sure that the pData column named "known_type" has labelled some cells as "Plasma Cell" and some as "CD34_ pre-B.
Let me know if this solves it
Closing now, let me know if this continues to be a problem.