CRISPyR is a command-line tool based on CRISPy (https://doi.org/10.1002/bit.25233).
Its a RUST rewrite (and extension) of the original CRISPy, hence the 'R' in the name.
You can use it for finding and scoring CRISPR target sites in FASTA sequences.
CRISPyR supports the Cas9 (the default) and the Mad7 (experimental) endonucleases.
CRISPyR finds candidate target sequences and gives an off-target score.
Target sequences are found either by searching a genome (or any fasta file) for PAM sites (NGG-'3 for Cas9, 5'-YTTN for Mad7) or alternatively, the user may supply a list of target sequences.
Each target sequence is scored for off-targets based on the number of sequences matching the 13bp adjacent to the PAM (upstream of the PAM for Cas9 and downstream for Mad7). The resulting score represents the number putative off-targets, with a higher score representing more and/or closely matching off-target sequences.
If you use CRISPyR, then please cite the paper:
Ronda et al (2014). Accelerating genome editing in CHO cells using CRISPR
Cas9 and CRISPy, a web-based target finding tool.Biotechnol. Bioeng. 111:
1604–1616. doi: 10.1002/bit.25233
https://doi.org/10.1002/bit.25233
CRISPyR is built using the Rust programming language. To compile CRISPyR, install Rust as described on the Rust website and run 'cargo build --release' in the source directory:
$ git clone https://github.com/laeblab/crispy.git
$ cd crispy
$ cargo build --release
The resulting executable is located at './target/release/crispyr'.
The following examples use the files included in the 'examples' folder.
Note that CRISPyR outputs results as tab separated tables. The results shown below have been piped through 'column -t' for the sake of readability.
To use CRISPyR, you must first index the genome against which you wish to score potential guide RNAs. To index a genome, use the CRISPyR 'index' command on a FASTA file:
$ crispyr index examples/genome.fasta
This will create the index file 'examples/genome.fasta.crispyr_cas9'.
By default CRISPyR will search for PAM sites for CAS9 (NGG), but MAD7 (YTTN) is also supported, and may be selected using the --enzyme option:
$ crispyr index --enzyme mad7 examples/genome.fasta
This will create the index file 'examples/genome.fasta.crispyr_mad7'. Note that the cut-site for Mad7 is not documented, and results will therefore not show the expected cut-site for target sequences.
The index command can also be run with '--positions' to record the positions of all PAM sites in the genome. This takes longer and significantly increases the size of the index, but is required to run the 'offtargets' command.
The 'find' command takes a CRISPyR index and a FASTA file as input, and prints a table of target sites in the FASTA file along with the CRISPy score representing the matching 13bp kmers found in the index (see above):
$ crispyr find examples/genome.fasta.crispyr_cas9 examples/genome.fasta
Sequence Name Start End Cutsite Strand Score
GCTAGCTAGCTAGCTATAAAagg test 14 36 31 + 1000
CTAGCTAGCTAGCTATAAAAggg test 15 37 32 + 1000
CTACGTAGCTACTAGCTGACtgg test 49 71 66 + 1000
[...]
Positions are given using base-1 values on the forward strand (regardless of which strand the target was found). The PAM is indicated in the sequence using lower-case letters.
A higher Score indicates more/better matching off-targets, meaning that target sequences with lower scores should be selected when possible.
The 'score' command can be used to score a set of gRNAs that have been obtained without using CRISPyR. It needs an index and a tab separated table and it will append the CRISPy score for each row in which the first column contains a target sequence matching the PAM used to create the index:
$ crispyr score examples/genome.fasta.crispyr_cas9 examples/targets.tsv
AATAGCTAGCTAGCTATAAAAGG 2 copy target 1000
CAGCTACTAGCTAGTCGATGNGG 2 copy target 1000
CCCCCCCCCCCCCCCCCCCCCAA not a target at all. should get 0 0
CRISPyR will only attempt to score target sequences that
- contains a PAM,
- contains at least 13 bp in addition to the PAM
- the 13 bp only consist of nucleotides A, C, G, or T. Invalid target sequences will be assigned the score 'NA'.
The input table may (optional) contain a header, in which case CRISPyR will appends a column named 'Score'.
The 'offtargets' command takes a CIRPSyR index and a tab separated table and returns a list of sites for each (unique) target sequence that CRISPyR considers to be off-targets for that sequence:
$ crispyr offtargets examples/genome.fasta.crispyr_cas9 examples/targets.tsv
Query Offtarget Name Start End Cutsite Strand Score
aatagctAGCTAGCTATAAAagg gctagctAGCTAGCTATAAAagg test 14 36 31 + 500
aatagctAGCTAGCTATAAAagg gctagctAGCTAGCTATAAAagg test1 14 36 31 + 500
cagctacTAGCTAGTCGATGngg cagctacTAGCTAGTCGATGcgg tesss 120 142 137 + 500
cagctacTAGCTAGTCGATGngg cagctacTAGCTAGTCGATGcgg tesss 162 184 179 + 500
The 13bp kmer used to identify off-targets (see above) is written in uppercase.
The Score column represents the weight given to each individual off-target, with the score of the query being the sum of those scores (see src/score.rs). It is also possible to filter off-targets based on this score, with a score of 500 representing perfect matches between the 13bp kmer in the query and off-target:
$ crispyr offtargets examples/genome.fasta.crispyr_cas9 examples/targets.tsv --min-score 500
For the 'offtargets' command to work, the genome must have been indexed with the '--positions' option. Note that this greatly increases the size of the index and isn't recommended unless you specifically need the 'offtargets' functionality:
$ crispyr index --positions examples/genome.fasta
Target sequences must meet the requirements described for the 'score' command; any value that does not meet these requirements trigger a warning.
The Offtarget column requires the availability of a faidx indexed FASTA file:
$ samtools faidx /path/to/genome.fasta
By default CRISPyR will attempt to the filename produced by removing the '.crispyr_*' extension from the index filename, but an alternative path may be specified using the '--fasta' option. If no such FASTA file is available, then this column will contain the value 'NA'.