/Python-RNA-Secondary-Structure

variation of nussinovs algorithm in python for RNA secondary structure with visualizations

Primary LanguagePython

RNA Secondary Structure Prediction

Using dataset of Non Coding SNPs predict secondary structure with a variation of Nussinovs algorithm, energy_min.py RN_Analyze.py is run first and will use energy_min.py to return a .tsv file SEQ_DB.tsv with sequences and their corresponding dot bracket structures. Base pair distance is calculated using bp_distance from the RNA package taken from ViennaRNA. Visuals are created with RNAvisual.py.

To Run

First run: python RN_Analyze.py SNP.tsv sequences.fa

To run visualization: python RNAvisual.py SEQ_DB.tsv

Installations Required

RN_Analyze

Reading fasta file: pip install biopython Bp distance: miniconda or anaconda can be used

  1. conda create -n viennarna -c bioconda viennarna
  2. conda activate viennarna

Alternatively follow steps provided by viennarna or use homebrew: https://www.tbi.univie.ac.at/RNA/ViennaRNA/doc/html/install.html

RNAvisual

pip install dash pip install dash_bio pip install pandas

Dataset

Filtering all clinically significant SNPs

Biotypes

  • mitochondrial tRNA
  • micro RNA
  • Small nucleolar RNA

Clinical significance

  • Benign, likely benign, uncertain, likely pathogenic, pathogenic

SNP location and substitution

Compare remaining SNPs to all ncRNA to select appropriate genes

energy_min.py

Follows nussinovs algorithm with the following additions

  1. Minimum loop

    This means that pairs must have some distance between them Ex. min loop of 2 would reject ..(.).. And accept .(..)..

  2. Stacked base pairs

    ((.......)) preferred over (.(...)...)

  3. Score Pairing

    GC preferred over AU and GU

RN_Analyze

run this first with

python RN\_Analyze SNP.tsv sequences.fa

outputs SEQ\_DB.tsv

RN_Analyze uses energy min to create dot structures for RNA sequences. This program takes 1-2 minutes to run

RNAVisual

Visual representations using dash to run this you must first run python RN\_Analyze.py SNP.tsv sequences.fa

after this run python RNAvisual.py SEQ\_DB.tsv

dash will run on http://127.0.0.1:8050/