Using dataset of Non Coding SNPs predict secondary structure with a variation of Nussinovs algorithm, energy_min.py
RN_Analyze.py
is run first and will use energy_min.py
to return a .tsv file SEQ_DB.tsv
with sequences and their corresponding dot bracket structures. Base pair distance is calculated using bp_distance
from the RNA package taken from ViennaRNA.
Visuals are created with RNAvisual.py
.
First run: python RN_Analyze.py SNP.tsv sequences.fa
To run visualization: python RNAvisual.py SEQ_DB.tsv
Reading fasta file: pip install biopython
Bp distance: miniconda or anaconda can be used
conda create -n viennarna -c bioconda viennarna
conda activate viennarna
Alternatively follow steps provided by viennarna or use homebrew: https://www.tbi.univie.ac.at/RNA/ViennaRNA/doc/html/install.html
pip install dash
pip install dash_bio
pip install pandas
Filtering all clinically significant SNPs
Biotypes
- mitochondrial tRNA
- micro RNA
- Small nucleolar RNA
Clinical significance
- Benign, likely benign, uncertain, likely pathogenic, pathogenic
SNP location and substitution
Compare remaining SNPs to all ncRNA to select appropriate genes
Follows nussinovs algorithm with the following additions
-
Minimum loop
This means that pairs must have some distance between them Ex. min loop of 2 would reject ..(.).. And accept .(..)..
-
Stacked base pairs
((.......)) preferred over (.(...)...)
-
Score Pairing
GC preferred over AU and GU
run this first with
python RN\_Analyze SNP.tsv sequences.fa
outputs SEQ\_DB.tsv
RN_Analyze uses energy min to create dot structures for RNA sequences. This program takes 1-2 minutes to run
Visual representations using dash
to run this you must first run python RN\_Analyze.py SNP.tsv sequences.fa
after this run
python RNAvisual.py SEQ\_DB.tsv
dash
will run on http://127.0.0.1:8050/