SC-Elephant
utilizes RamData
, a novel single-cell data storage format, to support a wide range of single-cell bioinformatics applications in a highly scalable manner, while providing a convenient interface to export any subset of the single-cell data in SCANPY
's AnnData
format, enabling efficient downstream analysis of the cells of interest. The analysis result can then be made available to other researchers by updating the original RamData
, which can be stored in cloud storage like AWS
(or any AWS-like object storage).
SC-Elephant
and RamData
enable real-time sharing of extremely large single-cell data using a browser-based analysis platform as it is being modified on the cloud by multiple other researchers, convenient integration of a local single-cell dataset with multiple large remote datasets (RamData
objects uploaded by other researchers), and remote (private) collaboration on an extremely large-scale single-cell genomics dataset.
Tutorials can be found at doc/jn/
Tutorial 1) Processing and analysis of the 3k PBMCs dataset using SC-Elephant
Tutorial 3) Combine 10x MEX count matrices memory-efficiently using SC-Elephant
Tutorial 4) Convert existing AnnData into RamData for collaborative data sharing
Briefly, a RamData object is composed of two RamDataAxis (Axis) objects and multiple RamDataLayer (Layer) objects.
The two RamDataAxis objects, 'Barcode' and 'Feature' objects, use 'filter' to select cells (barcodes) and genes (features) before retrieving data from the RamData object, respectively.
RamData
employs RAMtx
(Random-accessible matrix) objects to store count matrix in sparse or dense formats.
RamData
greatly simplify sharing of very large single-cell datasets on the Web. Once processed by SC-Elephant, RamData
can be uploaded to GitHub, Amazon S3 Cloud, or any static file servers to share your single-cell datasets publicly with the research community or privately with your collaborators. The machine learning models, kNN graphs, cell-type annotations, and random-accessible expression count matrices (to name a few) of your single-cell datasets on the Web can be easily explored in Python environments and web browsers using SC-Elephant and SC-Elephant.js, respectively.
To explore RamData
objects publicly available on the Web using a web browser, please visit our SC-Elephant DB Viewer.