/cellxgene-gateway

Cellxgene Gateway allows you to use the Cellxgene Server provided by the Chan Zuckerberg Institute (https://github.com/chanzuckerberg/cellxgene) with multiple datasets.

Primary LanguagePythonApache License 2.0Apache-2.0

Overview

Cellxgene Gateway allows you to use the Cellxgene Server provided by the Chan Zuckerberg Institute (https://github.com/chanzuckerberg/cellxgene) with multiple datasets. It displays an index of available h5ad (anndata) files. When a user clicks on a file name, it launches a Cellxgene Server instance that loads that particular data file and once it is available proxies requests to that server.

codecov PyPI PyPI - Downloads

Running locally

Prequisites

  1. This project requires python 3.6 or higher. Please check your version with
$ python --version
  1. It is also a good idea to set up a venv
python -m venv .cellxgene-gateway
source .cellxgene-gateway/bin/activate # type `deactivate` to deactivate the venv

Install cellxgene-gateway

Option 1: Pip Install from Github

pip install git+https://github.com/Novartis/cellxgene-gateway

Note: you may need to downgrade h5py with pip install h5py==2.9.0 due to an issue in a dependency.

Option 2: Install from PyPI

pip install cellxgene-gateway

Running cellxgene gateway

  1. Prepare a folder with .h5ad files, for example
mkdir ../cellxgene_data
wget https://raw.githubusercontent.com/chanzuckerberg/cellxgene/master/example-dataset/pbmc3k.h5ad -O ../cellxgene_data/pbmc3k.h5ad
  1. Set your environment variables correctly:
export CELLXGENE_DATA=../cellxgene_data  # change this directory if you put data in a different place.
export CELLXGENE_LOCATION=`which cellxgene`
  1. Now, execute the cellxgene gateway:
cellxgene-gateway

Here's what the environment variables mean:

  • CELLXGENE_LOCATION - the location of the cellxgene executable, e.g. ~/anaconda2/envs/cellxgene/bin/cellxgene

At least one of the following is required:

  • CELLXGENE_DATA - a directory that can contain subdirectories with .h5ad data files, without trailing slash, e.g. /mnt/cellxgene_data
  • CELLXGENE_BUCKET - an s3 bucket that can contain keys with .h5ad data files, e.g. my-cellxgene-data-bucket Cellxgene Gateway is designed to make it easy to add additional data sources, please see the source code for gateway.py and the ItemSource interface in items/item_source.py

Optional environment variables:

  • CELLXGENE_ARGS - catch-all variable that can be used to pass additional command line args to cellxgene server
  • EXTERNAL_HOST - the hostname and port from the perspective of the web browser, typically localhost:5005 if running locally. Defaults to "localhost:{GATEWAY_PORT}"
  • EXTERNAL_PROTOCOL - typically http when running locally, can be https when deployed if the gateway is behind a load balancer or reverse proxy that performs https termination. Default value "http"
  • GATEWAY_IP - ip addess of instance gateway is running on, mostly used to display SSH instructions. Defaults to socket.gethostbyname(socket.gethostname())
  • GATEWAY_PORT - local port that the gateway should bind to, defaults to 5005
  • GATEWAY_EXPIRE_SECONDS - time in seconds that a cellxgene process will remain idle before being terminated. Defaults to 3600 (one hour)
  • GATEWAY_EXTRA_SCRIPTS - JSON array of script paths, will be embedded into each page and forwarded with --scripts to cellxgene server
  • GATEWAY_ENABLE_ANNOTATIONS - Set to true or to 1 to enable cellxgene annotations and gene sets.
  • GATEWAY_ENABLE_BACKED_MODE - Set to true or to 1 to load AnnData in file-backed mode. This saves memory and speeds up launch time but may reduce overall performance.
  • GATEWAY_LOG_LEVEL - default is INFO. set to DEBUG to increase logging and to WARNING to decrease logging.
  • S3_ENABLE_LISTINGS_CACHE - Set to true or to 1 to cache listings of S3 folders for performance. If the cache becomes stale, set filecrawl.html?refresh=true query parameter to refresh the cache.

If any of the following optional variables are set, ProxyFix will be used.

  • PROXY_FIX_FOR - Number of upstream proxies setting X-Forwarded-For
  • PROXY_FIX_PROTO - Number of upstream proxies setting X-Forwarded-Proto
  • PROXY_FIX_HOST - Number of upstream proxies setting X-Forwarded-Host
  • PROXY_FIX_PORT - Number of upstream proxies setting X-Forwarded-Port
  • PROXY_FIX_PREFIX - Number of upstream proxies setting X-Forwarded-Prefix

The defaults should be fine if you set up a venv and cellxgene_data folder as above.

Running cellxgene-gateway with Docker

First, build Docker image:

docker build -t cellxgene-gateway .

Then, cellxgene-gateway can be launched as such:

docker run -it --rm \
-v <local_data_dir>:/cellxgene-data \
-p 5005:5005 \
cellxgene-gateway

Additional environment variables can be provided with the -e parameter:

docker run -it --rm \
-v ../cellxgene_data:/cellxgene-data \
-e GATEWAY_PORT=8080 \
-p 8080:8080 \
cellxgene-gateway

Customization

The current paradigm for customization is to modify files during a build or deployment phase:

  • To modify CSS or JS on particular gateway pages, overwrite or append to the templates
  • To add script tags such as for user analytics to all pages, set GATEWAY_EXTRA_SCRIPTS
    • these scripts will also be run on the pages served by cellxgene server via the --scripts parameter
    • See chanzuckerberg/cellxgene#680 for details on --scripts parameter

Currently we use a bash script that copies the gateway to a "build" directory before modifying templates with sed and the like. There is probably a better way.

Development

We’re actively developing. Please see the "future work" section of the wiki. If you’re interested in being a contributor please reach out to @alokito.

Developer Install

If you want to develop the code, you will need to clone the repo. Make sure you have the prequesite listed above, then:

  1. Clone the repo
    git clone https://github.com/Novartis/cellxgene-gateway.git
    cd cellxgene-gateway
  1. Install requirements with
pip install -r requirements.txt
  1. Install the gateway in developer mode
python setup.py develop

For convenience, the code repo includes a run.sh.example shell script to run the gateway.

  1. Install pre-commit hooks
conda install -c conda-forge pre-commit
pre-commit install

Running Tests

Build Status

    python -m unittest discover tests

Code Coverage

    coverage run -m unittest discover tests
    coverage html

Running Linters

pip install isort flake8 black

isort -rc . # rc means recursive, and was deprecated in dev version of isort
black .

Getting Help

If you need help for any reason, please make a github ticket. One of the contributors should help you out.

Releasing New Versions

How to prepare for release

  • Update Changelog.md and version number in init.py
  • Cut a release on github
    • Go to your project homepage on GitHub
    • On right side, you will see Releases link. Click on it.
    • Click on Draft a new release
    • Fill in all the details
      • Tag version should be the version number of your package release
      • Release Title can be anything you want, but we use v0.3.11 (the same as the tag to be created on publish)
      • Description should be changelog
    • Click Publish release at the bottom of the page
    • Now under Releases you can view all of your releases.
    • Copy the download link (tar.gz) and save it somewhere

How to publish to PyPI

Make sure your .pypirc is set up for testpypi and pypi index servers.

rm -rf dist
python setup.py sdist bdist_wheel
python -m twine upload --repository testpypi dist/*
python -m twine upload dist/*

Contributors