-
Visualization at scale: built with WebGL, React & Redux to handle visualization of at least 1 million cells.
-
Interactive exploration: select, cross-filter, and compare subsets of your data with performant indexing and data handling.
-
Flexible API: the cellxgene client-server model is designed to support a range of existing analysis packages for backend computational tasks (eg scanpy), integrated with client-side visualization via a REST API.
Requirements
- OS: OSX, Windows, Linux -- the developers are currently testing on OSX and Windows (via WSL using Ubuntu). It should work on other platforms but if you are using something different and need help, please let us know.
- python 3.6
- python3 tkinter
- npm
- Google Chrome
Clone project
git clone https://github.com/chanzuckerberg/cellxgene.git
Install client
cd cellxgene
./bin/build-client
To use with virtual env for python (optional, but recommended)
ENV_NAME=cellxgene
python3 -m venv ${ENV_NAME}
source ${ENV_NAME}/bin/activate
Install server
pip install -e .
Run (with demo data)
cellxgene --title PBMC3K scanpy example-dataset/
In google chrome, navigate to the viewer via the web address printed in your console.
E.g., Running on http://0.0.0.0:5005/
Help
cellxgene --help
For help with the scanpy engine
cellxgene scanpy --help
To prepare your data you will need to format your data into AnnData format using scanpy and calculate PCA and nearest neighbors and save in h5ad format.
-
- Ensure that
obs
's index is the cell names:print(data.obs_names)
should show your cell indices. If it shows gene names, you may need to just calldata.transpose()
.
- Ensure that
-
Calculate PCA
sc.pp.pca(data) ## sc is scanpy.api
-
Calculate nearest neighbors (depending on layout algorithm)
# For umap layout algorithm, you need to use the "umap" method for neighbors sc.pp.neighbors(data, method="umap", metric="euclidean", use_rep="X_pca") # For tsne layout algorithm, you can use either "umap" or "gauss"; we recommend "gauss" sc.pp.neighbors(data, method="gauss", metric="euclidean", use_rep="X_pca")
-
Save file
# cellxgene requires file to be named data.h5ad data.write("data.h5ad")
-
Create config file (optional)
If you do not have a config file, the schema (metadata names, types, and categorical/continuous) will be inferred from the observations in the data file. Config file is required to be named 'data_schema.json' and located in the same directory as data file.
- The config file is a JSON format file with information on the metadata associated with the cells. The key is the column name in obs. The value is an object
type: string, int, or float (what type the values are), variabletype: categorical or continuous (categorical values are displayed as checkboxes, continuous values are displayed as a histogram) displayname: (what the heading should be displayed as) include: True/False (whether to display values on web interface)
Example { "CellName": { "type": "string", "variabletype": "categorical", "displayname": "Name", "include": true }, "clusters": { "type": "string", "variabletype": "categorical", "displayname": "Clusters", "include": true }, "num_genes": { "type": "int", "variabletype": "continuous", "displayname": "Number Genes", "include": true } }
We warmly welcome contributions from the community. Please submit any bug reports and feature requests through github issues. Please submit any direct contributions via a branch + pull request.
We’ve been inspired by several other related efforts in this space, including the UCSC Cell Browswer, Cytoscape, Xena, ASAP, Gene Pattern, & many others; we hope to explore collaborations where useful.
Have questions, suggestions, or comments? You can contact us by joining CZI Science Slack and posting in the #cellxgene channel. Please submit any feature requests or bugs as an issue in github. We'd love to hear from you!
This project was started with the sole goal of empowering the scientific community to explore and understand their data. As such, we whole-heartedly encourage other scientific tool builders to adopt the patterns, tools, and code from this project, and reach out to us with ideas or questions using Github Issues or Pull Requests. All code is freely available for reuse under the MIT license.
We thank Alex Wolf for the demo dataset.