LieberInstitute/spatialDLPFC

Make SPOTlight results reproducible + use all input snRNA-seq data

Closed this issue · 1 comments

See https://jhu-genomics.slack.com/archives/C01EA7VDJNT/p1671588769748179 for the full description.

Basically, over the winter break it'd be nice to have SPOTlight re-run using:

  • all the input snRNA-seq data if possible (like try with caracol if it needs lots of memory)
  • if we can't use all the input snRNA-seq data, then try with up to 1000 nuclei per cell type instead of the up to 100 that you are using now + add a set.seed() call to make the results reproducible.

Aka, a set.seed() before

# This was slightly changed from the tutorial for simplicity
cs_keep <- lapply(idx, function(i) sample(i, min(length(i), n_cells_per_type)))
if needed. Or well, avoid altogether.

(Applies for both IF and non-IF data)

Thanks!

After reading the docs for SPOTlight, @Nick-Eagles @lahuuki and me agree what we should use all the data since they do say you do need more if your cell types are related, which is the case in our layer-level analysis.

https://bioconductor.org/packages/release/bioc/vignettes/SPOTlight/inst/doc/SPOTlight_kidney.html

Screenshot 2022-12-22 at 2 05 00 PM

Screenshot 2022-12-22 at 2 05 16 PM

Screenshot 2022-12-22 at 2 06 33 PM