Spot composition visualization
WesDe opened this issue · 7 comments
Hi,
I am trying to compare your deconvolution method with Seurat integration method. To do so, I am trying to reproduce your vignette with some Visium public dataset (kidney mouse). Single cell data used for the deconvolution comes from a RDS file storing a merged dataset (4 scRNA seq) that i have annotated (a column carries the information about cell type for each cell).
I didn't get warning from the code i was running (the same than your vignette) but my visualization showed almost every spot with unknown label:
I thought the deconvolution went fine by looking to the topic profiles but i may be wrong:
I was able to annotated these data through Seurat integrated method which is similar than deconvolution:
Best,
Hi @WesDe ,
Thanks so much for giving SPOTlight a shot!
This is indeed very odd, as a quick fix I would suggest removing the unknown cell type from the training set since it seems that it is capturing all the signal...
Could you share with me the trained NMF model so I can take a look at what might be happening?
NMFmod <- decon_mtrx_ls[[1]][[1]]
thanks for reaching out and sorry for the inconvenience,
Marc
Hi @MarcElosua,
Thanks for the quick reply. I saw that i didn't remove cells associated with high mitochondrial genes expressed from my scRNA sequencing dataset. This caused to have a cluster called unknown, i don't think it can be the reason behind this kind of issue. I annotated again my sc RNA seq dataset with removing high mitochondrial genes expressed cells but it did not change the result.
I can share the rds file if it can help you (sorry had to add the .gz extension for github)
From the first run : spotlight_ls.rds.gz
From the second run : spotlight_ls_2.rds.gz
Wes
Hi @WesDe
So sorry for the late reply, I put together a brief R markdown document where I took a more in-depth look at what may be happening in your situation. Since it is your data do you mind me posting it here? If not, could you please share with me your email address so I can send it directly to you?
Thanks again for giving SPOTlight a shot and for your patience!
Marc
Hi @MarcElosua,
The data come from a 10x genomics public dataset so I don't mind if you post it here. It may even help people if they encounter the same issue.
Best,
Wes
Hi @WesDe ,
I added the output PDF file as well as the Rmarkdown file (as .txt as github doesn't allow me to upload the .Rmd).
Briefly I believe the issue is due to Regulatory T cells in your 1st run and the Unkown cell in the 2nd run. These cells don't have a clear topic obtained after running NMF. This is due to the cells grouped into that cell type have very heterogeneous profiles. Therefore, when trying to find a consensus topic SPOTlight is unable to find one and ends up obtaining a very low contribution of each topic for that cell type. Then when trying to fit the single-cell profiles to the visium spots they capture all the signal.
Personally I would recommend the following strategies to try to solve this issue:
- Remove the clusters of cells that are returning these mixed topic profiles.
- Try to change the gene set used so that you remove noisy genes (ribosomal, mitochondrial, non cell-type specific...) and make sure the canonical ones defining each cell type are present. Sometimes a smaller more specific gene set using only the marker genes and not the highly-variable genes. As far as I could tell the Regulatory T cell cluster in the 1st run doesn't express T cell genes nor FOXP3 or HAVCR2.
- In your second run I see that some of the cell types defined have a number of cells < 5. I would recommend having clusters with > ~20 cells since that allows the model to correctly capture the biological signal, if there are clusters with low number of cells the biology learned will be specific to those ~5 cells. Due to the sparsity of scRNAseq data the model cannot fully learn the shared biology of that cell type and will instead learn a signature specific for those few cell types.
Hope this helps!
Please let me know if you have any further questions, I am more than happy to help!
Hey, I am trying to perform the SPOTlight R package using seurat data. You changed all the names you put on your early tutorial and did not updated it. It is a long time to understand all of it
Please refer to the bioconductor tutorial here.