SpliceMIT 2.0 - Splice Modelling (Intronic) Technology 2.0

SpliceMIT is a tool to analyze and produce the most effective gRNA sequences in an RNA splicing setting based on user inputs of an intron sequence, upstream and downstream 20 nt sequence, gRNA size, and tracrRNA sequence. It sets up cut-off values to filter the gRNAs and eventually provide a list of effective gRNA sequences.

example RBPs: dCas13a, MS2, L7Ae, RCas9

Language: Python

Package required: Selenium, tqdm

Read below and ALL ANNOTATIONS in model.py before running!!!

File explanation:

Nupack_data.csv :
sample data for secondary structure of the crRNA. The results are obtained from Nupack developed by Caltech.

Off_target_data.csv:
sample data for off-target in a human cell (analyze using the homo sapiens cDNA sequence)

Pre_crRNA_structure_Nupack_data.csv:
sample data for secondary structure of the pre-crRNA, also obtained from Nupack.

RNA_binding_affinity_data.csv:
containing the dissociation constant for over 90 RNA binding proteins found in Human and Mouse. (THIS IS NOT A SAMPLE FILE)

human-part1.rar & human-part2:
compressed FASTA file of human cDNA sequence(txt format). The size is approximately 300 million bases. The original FASTA file is from the UCSC genome browser.

How to use SpliceMIT:

Download model.py; FASTA.py; RNA_binding_affinity_data.csv and both human-part1.txt.rar and human-part2.txt.rar (in the same directory)

Unzip the rar files and combine them into one single txt file. Name it as "human.txt"

Run the FASTA.py in order to transfer fasta format into simple "ATCG".

Open model.py and read ALL annotations before running.

Modify the variables in the “Global Constant” region. It should be right below all the functions

Run the program and wait for the result. (The sample running is on a 850nt intron – 708 gRNAs, and took approximately 36 hours in total)

You will see the top 10 crRNA sequences

You may want to use PhantomJS as a virtual webdriver application if you are running the program directly on your work computer/laptop. Please go check out http://phantomjs.org/

Variables you PROBABLY want to change:

tracrRNA_seq

direct_repeat_sequence

(All weight values)

Factors that taken into account:

Off-target binding of gRNA

GC content of the crRNA sequence

Secondary structure of the crRNA structure

Secondary structure of the pre-gRNA structure

Competition of different RNA-binding proteins at the site

Location of ISE and ISS of the given intron is not included but could be easily implemented by adding a list and apply bonus/penalty scores.

RBP Interference Score Algorithm:

a is the weight value;
Kd is the dissociation constant for the RBP;
Lb is the length of binding;
Lm is the length of RBP-binding motif;
[RBP] is the concentration of an RNA-binding protein(set to default)

Citations:

GC content:

Tessa G. Montague, José M. Cruz, James A. Gagnon, George M. Church, Eivind Valen; CHOPCHOP: a CRISPR/Cas9 and TALEN web tool for genome editing. Nucleic Acids Res 2014; 42 (W1): W401-W407. doi: 10.1093/nar/gku410
Wang, Tim et al. “Genetic Screens in Human Cells Using the CRISPR/Cas9 System.” Science (New York, N.Y.) 343.6166 (2014): 80–84. PMC. Web. 5 July 2017.
Tsai, Shengdar Q. et al. “GUIDE-Seq Enables Genome-Wide Profiling of off-Target Cleavage by CRISPR-Cas Nucleases.” Nature biotechnology 33.2 (2015): 187–197. PMC. Web. 5 July 2017.

RBPmap:

Inbal Paz, Idit Kosti, Manuel Ares, Jr, Melissa Cline, Yael Mandel-Gutfreund; RBPmap: a web server for mapping binding sites of RNA-binding proteins. Nucleic Acids Res 2014; 42 (W1): W361-W367. doi: 10.1093/nar/gku406

Nupack:

J. N. Zadeh, C. D. Steenberg, J. S. Bois, B. R. Wolfe, M. B. Pierce, A. R. Khan, R. M. Dirks, N. A. Pierce. NUPACK: analysis and design of nucleic acid systems. J Comput Chem, 32:170–173, 2011.

RNA binding affinity:

Citations are included in the file: RNA_binding_affinity_data.csv

Codes and algorithms by: Qianchang Dennis Wang
Cooperated with: Ben Kaplan 19', Molly Stephens 18', Adil Yusuf 20', Ronit Langer 20', and our amazing leader - Brian Teague
Also speical thanks to MIT Burge Lab, Harvard Schier Lab and Valen Lab at University of Bergen.

MITiGEM2017/SpliceMIT