/champ

CHAMP: Chip-Hybridized Association Mapping Platform

Primary LanguageJupyter Notebook

CHAMP: Chip-Hybridized Affinity Mapping Platform

This software was used for analyses described in the manuscript:

Massively Parallel Biophysical Analysis of CRISPR-Cas Complexes on Next Generation Sequencing Chips

Cheulhee Jung, John Hawkins, Stephen K. Jones Jr, Yibei Xiao, James Rybarski, Kaylee Dillard, Jeffrey Hussmann, Mashelfatema A Saifuddin, Cagri A. Savran, Andrew Ellington, Ailong Ke, William H. Press, and Ilya J. Finkelstein
Cell 170, 35–47, June 29, 2017
http://dx.doi.org/10.1016/j.cell.2017.05.044

Installation

CHAMP has only been run on Ubuntu. You'll need a few dependencies first:

sudo apt install -y build-essential git sextractor samtools bowtie2 virtualenv python-dev zlib1g-dev
git clone https://github.com/hawkjo/champ.git

Optionally, you can install into a virtual environment (recommended):

cd champ
virtualenv env
. env/bin/activate

Now install Python packages and CHAMP:

pip install numpy==1.11.1 && pip install scipy==0.18.0 && pip install -r requirements.txt && python setup.py install

Typical Pipeline

Mapping Reads

When a new chip is received, it needs to be analyzed once to determine which reads are fiducial markers (that is, clusters of phiX genomic DNA) and which are to be used in the experiment. Target sequences are kept in a file in YAML format. Short read alignment files produced by Bowtie2 are used to classify genomic DNA. In the example below, the phiX files are in a directory called phix_bowtie and the prefix phix is provided since the files all begin with that (this is what Bowtie requires). min-len and max-len refer to the minimum and maximum length of the sequences of interest (note that for CRISPR systems, this length includes the PAM).

champ map SA16032/all_fastqs SA16032/read_names --target-sequence-file targets.yml --phix-bowtie phix_bowtie/phix --min-len 24 --max-len 46

Setting Up a New Analysis

When a new experiment is run and the image files are uploaded to the server, you'll need to run champ init to associate some metadata about the experiment with the image files. There are several mandatory pieces of information and some optional ones. This creates a file champ.yml that holds this metadata, and which is used during the alignment process to checkpoint progress.

IMAGE_DIRECTORY the directory that contains all of the TIF files (and will contain the HDF5 files)

READ_NAMES_DIRECTORY the path to the directory that contains the text files produced by the champ map command.

ALIGNMENT_CHANNEL the name of the color channel that phiX is visible in. We actually recommend (require?) that a subset of phiX be labeled in channels that proteins are visible in, to help with alignment of low concentrations, but this refers specifically to the channel where 100% of phiX clusters are visible.

--perfect-target-name the key used in the dictionary in the target YAML file that identifies your target sequence

--alternate-perfect-reads the path to a text file of read names that should be treated as the perfect target reads. Usually this is for experimenting with reads in case you're not sure what the protein might bind to

--alternate-good-reads just like alternate perfect reads above, except it is assumed that it contains some reads that will not bind as well

--alternate-fiducial-reads use read names in a given text file instead of phiX for the rough alignment step

--microns-per-pixel the size of a side of one pixel, in microns

--chip the type of chip, either miseq or hiseq

--ports-on-right the stage adapters we use orient the two input ports on the chip to the left or right, depending on which microscope we use. This needs to be known for alignment. By default, we assume they are on the left unless this flag is passed.

--flipud invert all images through the horizontal axis (i.e. run numpy.flipud() on all images). This was added to handle a quirk with the way MicroManager saves images.

--fliplr invert all images through the vertical axis (i.e. run numpy.fliplr() on all images). If your images don't align you may try passing it in.

-v -vv -vvv set the verbosity level (-vvv is debug mode).

Generating HDF5 files

CHAMP uses HDF5 files with a specific format. If you're using MicroManager and generating OME-TIFF files, CHAMP's built-in conversion tool (champ h5) will work out of the box. If your raw image files aren't formatted and named exactly as necessary, you'll need to generate the HDF5s yourself.

Aligning Images

CHAMP will attempt to align as many images as possible. The output will be the coordinates of each FASTQ read within an image, saved in text files in the results directory, along with a file containing the alignment parameters.

IMAGE_DIRECTORY the directory that contains all of the HDF5 image files

--rotation-adjustment rotational adjustment to apply to read coordinates before attempting alignment. Can be negative! Even misalignment by a degree can prevent the rough alignment from working. If your alignments don't work, try a range of values from -5 to 5 degrees in 0.5 degree increments.

--min-hits the minimum number of exclusive hits required for a precision alignment to be considered valid

--snr the minimum signal-to-noise ratio (relative to random alignments) to consider a rough alignment valid. We have found that 1.4 to be ideal under most scenarios.

--make-pdfs produce some diagnostic PDFs to examine the quality of the alignment

--fiducial-only only align the channel with the fiducial markers.

-v -vv -vvv set the verbosity level (-vvv is debug mode).

Analyzing Results

Analyses of sequence specificity are performed using the Jupyter notebooks provided in the notebooks directory. The intended workflow is to copy them from this repo into each new experiment directory, edit the few variables as needed at the top of each notebook, and run them.