/scfind

Primary LanguageC++GNU General Public License v3.0GPL-3.0

scfind - Fast searches of large collections of single cell data

Single cell technologies have made it possible to profile millions of cells, but for these resources to be useful they must be easy to query and access. To facilitate interactive and intuitive access to single cell data we have developed scfind (source available at https://github.com/hemberg-lab/scfind), a search engine for cell atlases. Scfind can be used to evaluate marker genes, to perform in silico gating, and to identify both cell-type specific and housekeeping genes. An interactive interface website with 9 single cell datasets is available at https://scfind.sanger.ac.uk.

Q: What is this?

A: scfind is a search engine that makes single cell data accessible to a wide range of users by enabling sophisticated queries for large datasets through an interface which is both very fast and familiar to users from any background.

Q: How to install/run scfind?

A: If you would like to install the latest development version of scfind please install it from the GitHub repository:

# Linux and Mac users, run this in your R session:
install.packages("devtools")
devtools::install_github("hemberg-lab/scfind")

library("scfind")

# For Windows users:
# Please install the latest version of Rtools at https://cran.r-project.org/bin/windows/Rtools/ prior to installation of scfind

Update The latest version (3.7.0) of scfind is released on 26th November 2020. It has provided 2 datasets and 3 pre-processed scfind indexes as examples. The stability of the scfind interactive session has been enhanced. To update the latest version:

install.packages("devtools")
devtools::install_github("hemberg-lab/scfind", force = TRUE)

Q: Where can I find the scfind example datasets and indexes?

A: The latest version of the package is providing a list of example SingleCellExperiment objects and scfind indexes created from the The Tabula Muris Consortium for your first scfind experience:

library("scfind")

# List of `Tabula Muris (FACS)` `SingleCellExperiment` objects
data(tmfacs)

# List of `Tabula Muris (10X)` `SingleCellExperiment` objects
data(tm10x)

The detail of building scfind index from SingleCellExperiment object is described in this page.

library("scfind")
library("SingleCellExperiment")

# To build the `Bladder` index
sce.bladder <- readRDS(url(tmfacs["Bladder"]))
scfind.index <-  buildCellTypeIndex(sce = sce.bladder, 
                             cell.type.label = "cell_type1",
                             dataset.name = "Bladder", 
                             assay.name = "counts")

You can use the mergeDataset function to combine more than one dataset into one super index. The function saveObject allows you to save your index for future use.

To Quick Start scfind with pre-computed indexes:

# `scfind` index of the `Tabula Muris (FACS)` dataset
data(ExampleIndex)

scfind.index.tmfacs <- loadObject(file = url(ExampleIndex["TabulaMurisFACS"]))

# `scfind` index of the `Tabula Muris (10X)` dataset
scfind.index.tm10x <- loadObject(file = url(ExampleIndex["TabulaMuris10X"]))

# `scfind` index of the super index that contains both `Tabula Muris (FACS)` & `Tabula Muris (10X)` datasets
scfind.index.tm10x <- loadObject(file = url(ExampleIndex["TabulaMurisSuperIndex"]))

Q: How to start the interactive interface?

A: To use the interactive interface of the scfind search engine, you are welcome to play around with one of our collections or try with your own scfind index in the R session:

library("scfind")
scfind.index <- loadObject(file = "/path/to/your/index.rds")
scfindShiny(object = object)

Q: How can I use the scfind "free text" search engine mentioned in the manuscript?

A: The prototype of scfind that features Natural Language Process can be found at this website. To keep the standard version (3.7.0) of scfind streamlined and computer-friendly, we've decided not to include the NLP feature since it requires more dependencies.

Q: Where can I report bugs, comments, issues or suggestions?

A: Please use this page.

Q: Is scfind published?

A: The preprint of scfind is available on bioRxiv. The final version of the article is available on Nature Methods. The press release of scfind can be found on Wellcome Sanger Institute. To cite this paper, article:

Lee, J.T.H., Patikas, N., Kiselev, V.Y. et al. Fast searches of large collections of single-cell data using scfind. Nat Methods 18, 262–271 (2021). https://doi.org/10.1038/s41592-021-01076-9

Q: What is scfind licence?

A: GPL-3