SpliceMIT is a tool to analyze and produce the most effective gRNA sequences in an RNA splicing setting based on user inputs of an intron sequence, upstream and downstream 20 nt sequence, gRNA size, and tracrRNA sequence. It sets up cut-off values to filter the gRNAs and eventually provide a list of effective gRNA sequences.
- Nupack_data.csv :
sample data for secondary structure of the crRNA. The results are obtained from Nupack developed by Caltech.- Off_target_data.csv:
sample data for off-target in a human cell (analyze using the homo sapiens cDNA sequence)- Pre_crRNA_structure_Nupack_data.csv:
sample data for secondary structure of the pre-crRNA, also obtained from Nupack.- RNA_binding_affinity_data.csv:
containing the dissociation constant for over 90 RNA binding proteins found in Human and Mouse. (THIS IS NOT A SAMPLE FILE)- human-part1.rar & human-part2:
compressed FASTA file of human cDNA sequence(txt format). The size is approximately 300 million bases. The original FASTA file is from the UCSC genome browser.
- Download model.py; FASTA.py; RNA_binding_affinity_data.csv and both human-part1.txt.rar and human-part2.txt.rar (in the same directory)
- Unzip the rar files and combine them into one single txt file. Name it as "human.txt"
- Run the FASTA.py in order to transfer fasta format into simple "ATCG".
- Open model.py and read ALL annotations before running.
- Modify the variables in the “Global Constant” region. It should be right below all the functions
- Run the program and wait for the result. (The sample running is on a 850nt intron – 708 gRNAs, and took approximately 36 hours in total)
- You will see the top 10 crRNA sequences
- You may want to use PhantomJS as a virtual webdriver application if you are running the program directly on your work computer/laptop. Please go check out http://phantomjs.org/
- tracrRNA_seq
- direct_repeat_sequence
- (All weight values)
- Off-target binding of gRNA
- GC content of the crRNA sequence
- Secondary structure of the crRNA structure
- Secondary structure of the pre-gRNA structure
- Competition of different RNA-binding proteins at the site
- Location of ISE and ISS of the given intron is not included but could be easily implemented by adding a list and apply bonus/penalty scores.
a is the weight value;
Kd is the dissociation constant for the RBP;
Lb is the length of binding;
Lm is the length of RBP-binding motif;
[RBP] is the concentration of an RNA-binding protein(set to default)
Tessa G. Montague, José M. Cruz, James A. Gagnon, George M. Church, Eivind Valen; CHOPCHOP: a CRISPR/Cas9 and TALEN web tool for genome editing. Nucleic Acids Res 2014; 42 (W1): W401-W407. doi: 10.1093/nar/gku410
Wang, Tim et al. “Genetic Screens in Human Cells Using the CRISPR/Cas9 System.” Science (New York, N.Y.) 343.6166 (2014): 80–84. PMC. Web. 5 July 2017.
Tsai, Shengdar Q. et al. “GUIDE-Seq Enables Genome-Wide Profiling of off-Target Cleavage by CRISPR-Cas Nucleases.” Nature biotechnology 33.2 (2015): 187–197. PMC. Web. 5 July 2017.
Inbal Paz, Idit Kosti, Manuel Ares, Jr, Melissa Cline, Yael Mandel-Gutfreund; RBPmap: a web server for mapping binding sites of RNA-binding proteins. Nucleic Acids Res 2014; 42 (W1): W361-W367. doi: 10.1093/nar/gku406
J. N. Zadeh, C. D. Steenberg, J. S. Bois, B. R. Wolfe, M. B. Pierce, A. R. Khan, R. M. Dirks, N. A. Pierce. NUPACK: analysis and design of nucleic acid systems. J Comput Chem, 32:170–173, 2011.
Citations are included in the file: RNA_binding_affinity_data.csv
Codes and algorithms by: Qianchang Dennis Wang
Cooperated with: Ben Kaplan 19', Molly Stephens 18', Adil Yusuf 20', Ronit Langer 20', and our amazing leader - Brian Teague
Also speical thanks to MIT Burge Lab, Harvard Schier Lab and Valen Lab at University of Bergen.