Utils to pre-process data for Vitessce.
Sample datasets come from:
- Codeluppi et al.: Spatial organization of the somatosensory cortex revealed by cyclic smFISH
- Dries et al.: Giotto, a pipeline for integrative analysis and visualization of single-cell spatial transcriptomic data
- Wang et al.: Multiplexed imaging of high density libraries of RNAs with MERFISH and expansion microscopy
- Cao et al.: The single-cell transcriptional landscape of mammalian organogenesis
JSON is our target format right now because it is easily read by Javascript, and not so inefficient as to cause problems with storage or processing. For example: The mRNA HDF5 is 30M, but as JSON it is still only 37M.
vitessce-data
requires Python 3. First, set up a clean environment. If you are using conda:
conda create python=3.6 -n vitessce-data
# Confirm install, then:
source activate vitessce-data
Then install dependencies with pip
:
pip install -r requirements.txt
pip install -r requirements-dev.txt
test.sh
exercises all the scripts, using the fixtures infake-files/
, and errors if the output is not what is expected.process.sh
downloads full data from the internet, caches these input files inbig-files/input
, processes them, caches the output inbig-files/output
, and pushes to S3.
process.sh
only performs the work necessary. To regenerate just a portion of the data,
delete the files in big-files/output
that need to be replaced.