Questions concerning the workflow of SPOTlight

Question

Questions concerning the workflow of SPOTlight

JJBio opened this issue 2 years ago · 2 comments

Thank you for your very interesting tool!
I am following the updated vignette and have a few questions:

I have used Seurat to analyse my single-cell data and used FindAllMarkers with only.pos = TRUE. I then filtered the markers based on pct, adjusted p value and log2FC. Can I use these markers as input for the mgs parameter? If so, I do not have the mean.AUC. Can I use the log2FC, p-val or a combination thereof for the weight_id parameter?
I use WhichCells with the downsample parameter on the seurat object for downsampling. Since I have a large dataset with many cells and a few subpopulations are quite similar, I was thinking to use more cells than the suggested 100. How many would be sensible?
Am I correct in assuming that hvg = 2000 extracts the 2000 variable features of the seurat object? Or should I extract them and use them as input?
I hope this command looks ok:

res <- SPOTlight(
  x = sc_down,
  y = spatial@assays$Spatial@counts,
  groups = as.character(sc_down$ann),
  mgs = cluster_markers_all, # markers from FindAllMarkers
  hvg = 2000,
  weight_id = "avg_log2FC",
  group_id = "cluster",
  gene_id = "gene"
)

Thank you so much in advance!

Answer 1 · 2022-06-07T14:14:16.000Z

Hi @JJBio

Thank you so much for using SPOTlight! Here are some comments on your questions:

We specifically designed SPOTlight so it could take marker genes from different tools/methods. You can absolutely use log2FC, p-val or a combination of both!
When running SPOTlight with populations that are very similar I recommend 1st running it with the major cell type (CD4 T cell instead of CD4 Tfh, CD4 Th1, CD4 Th2...) and then running with the deeper annotation to make sure things make sense. In terms of how many cells to use 100 should be enough but bringing it up to 200 can help at the expense of increasing computational time. One of the most important factors in this case is to try to keep the cells for each cell type from as few batches as possible. Something you can play around with is running FindAllMarkers as you do but only between those populations that are highly similar and then include those genes to the mgs dataframe. That way you also make sure that genes that specifically differentiate these specific subpopulations are taken into account.
hvg needs a vector of the highly variable genes, 2000-3000 should be enough.
From the command I would only change the HVG parameter to a character vector!

Hope this helps, feel free to reach out again if you have any other questions!

Answer 2 · 2022-06-14T10:40:12.000Z

Thank you for your response!