Winnie09/Lamian

Error in as.igraph.vs(graph, to) : Invalid vertex name(s) in infer_tree_structure

Opened this issue · 6 comments

The error "Error in as.igraph.vs(graph, to) : Invalid vertex name(s)" appears when origin.celltype is not set up in infer_tree_structure. I checked the codes of infer_tree_structure() and findbranch(), and found out that there's a bug in lines 16-17 in findbranch.R:
if (!origin %in% vertex)
vertex <- c(origin, vertex)

It looks like the part '## find origin ' in infer_tree_structure has some issues so TSCANorder() cannot work.

Hi @YiZh2019 Thanks for your interest in our work! The TSCAN package was enhanced when the Lamian package was built. Are you using the most up-to-date TSCAN package on Github?

The same happens to me when I run "infer_tree_structure". I am using TSCAN package 2.0.0.

Hi @YiZh2019 Thanks for your interest in our work! The TSCAN package was enhanced when the Lamian package was built. Are you using the most up-to-date TSCAN package on Github?

Hi Dr. Hou,

Thanks for your reply! TSCANorder() should still work when the startcluster is kept as the default value of NULL, but infer_tree_structure() in LAMIAN could not create pseudotime when the origin.celltype is not set up because of the warning mentioned in this post. Is there an issue in the codes finding origin (lines 88 - 116 in infer_tree_structure.R) when LAMIAN constructs psuedotime?

I tried to reset the startcluster to NULL in TSCANorder(), ignoring finding origin part. It works without warning, but I get different pseudotime points for the same cell in the end. So we may still need to fix issues in finding origin in infer_tree_structure.R. What would be your suggestions to fix this? Thanks!

Best,
Yi

Hi @Winnie09 , I am also getting the same message when trying to run infer_tree_structure. This is my code:

seu <- as.SingleCellExperiment(tropho2, assay = "RNA")
seu1 <- bind_cols(
  as.data.frame(reducedDims(seu)$UMAP),
  as.data.frame(colData(seu))
  ) %>%
  sample_frac(1)
assays(seu)$counts
rd<-seu1[,c("UMAP_1","UMAP_2")]
df <- data.frame(rd, "CellTypeManual.l3" = as.character(seu1$CellTypeManual.l3))
man_tree_data<-list()
man_tree_data[['umap']]<-as.matrix(reducedDims(seu)$UMAP)
man_tree_data[['pca']]<-as.matrix(reducedDims(seu)$PCA)
man_tree_data[['expression']]<-as.matrix(assays(seu)$counts)
man_tree_data[['sample']]<-seu$sample_id
cell_types<-tropho2@meta.data[,"CellTypeManual.l3", drop=FALSE]
cell_types<-data.frame('cell' = rownames(cell_types),cell_types[,"CellTypeManual.l3",
drop=FALSE])
man_tree_data[['cell_types']]<- cell_types
str(man_tree_data)

List of 4
$ expression: num [1:36601, 1:27902] 0 0 0 0 0 0 0 0 0 0 ...
..- attr(, "dimnames")=List of 2
.. ..$ : chr [1:36601] "MIR1302-2HG" "FAM138A" "OR4F5" "AL627309.1" ...
.. ..$ : chr [1:27902] "CCTACACAGAAACCGC-1" "TCAGGTAAGGCTCATT-1" "ACTTACTTCATAGCAC-1" "GTGGGTCTCGTCGTTC-1" ...
$ umap : num [1:27902, 1:2] 4.164 -1.081 -0.918 -0.476 1.852 ...
..- attr(
, "scaled:center")= num [1:2] 0.23024 -0.00967
..- attr(, "dimnames")=List of 2
.. ..$ : chr [1:27902] "CCTACACAGAAACCGC-1" "TCAGGTAAGGCTCATT-1" "ACTTACTTCATAGCAC-1" "GTGGGTCTCGTCGTTC-1" ...
.. ..$ : chr [1:2] "UMAP_1" "UMAP_2"
$ cell_types:'data.frame': 27902 obs. of 2 variables:
..$ cell : chr [1:27902] "CCTACACAGAAACCGC-1" "TCAGGTAAGGCTCATT-1" "ACTTACTTCATAGCAC-1" "GTGGGTCTCGTCGTTC-1" ...
..$ CellTypeManual.l3: chr [1:27902] "STB" "STB" "STB" "STB" ...
$ pca : num [1:27902, 1:50] 7.72 6.58 5.56 8.44 8.47 ...
..- attr(
, "dimnames")=List of 2
.. ..$ : chr [1:27902] "CCTACACAGAAACCGC-1" "TCAGGTAAGGCTCATT-1" "ACTTACTTCATAGCAC-1" "GTGGGTCTCGTCGTTC-1" ...
.. ..$ : chr [1:50] "PC_1" "PC_2" "PC_3" "PC_4" ...

res = infer_tree_structure(pca = man_tree_data[['pca']],
                           expression = man_tree_data[['expression']],
                           cellanno = man_tree_data[['cell_types']], origin.celltype = NA)

Error in as_igraph_vs(graph, to) : Invalid vertex name(s)

##############

Could you please advise how to deal with this error? Many thanks

Hi @YiZh2019 Thanks for the valuable feedback! Did you try setting the parameter "origin.marker" in the function "infer_tree_structure()"? It seems that if both "origin.celltype" and "origin.marker" are NA, then we get errors. I will double-check the lines that calls the TSCAN functions in an unsupervised way; but for now, is it possible that some marker genes can be passed to "origin.marker" for identifying the origin cluster?

Hi @josemovi I guess we communicated in emails previously - let me know if the following helps:
"The function infer_tree_structure() have two parameters: origin.celltype and origin.marker. Either of them should be specified. From your codes, it seems that both are NA, therefore it leads to errors. Please go to the function's documentation for details about these two parameters or refers to this page: https://github.com/Winnie09/Lamian/blob/master/R/infer_tree_structure.R
As documented in the function, "expression only useful when users want to use highly expressed marker genes to determine the starting point of pseudotime. It is a gene by cell expression matrix. The values should be library-size-normalized and log-transformed expression values. They can either be imputed or non-imputed.", so please do not use the count values directly."