/Superfold

SuperFold is a pipeline that uses output data from ShapeMapper to model RNA secondary structures, including pseudoknots; identify de novo regions with well-defined and stable structures; and visualize most probable and alternative helices.

Primary LanguagePythonOtherNOASSERTION

###################################################################################
Superfold installation, execution, and troubleshooting.
Gregg Rice 2014
gmr@unc.edu

###################################################################################
Requirements:

===================================================================================
python 2.7

===================================================================================
RNAStructure - https://rna.urmc.rochester.edu/RNAstructure.html
Fold and partition executables necessary to predict secondary structure and base pairing probabilites

Download command-line applications for your platform
Extract to home directory

build binaries using the 'make all' command in the RNAStructure directory.

add following 2 lines to ~/.bash_profile

    export PATH=$PATH:$HOME/RNAstructure/exe
    export DATAPATH=$HOME/RNAstructure/data_tables
    
===================================================================================
matplotlib (python module required for .pdf figure rendering) - 

    Download source
    Extract to any directory
    cd to the extracted directory
    run the command "python setup.py install --user"

===================================================================================
httplib2 (python module only required if rendering structures) - 

    Download httplib2-0.7.6.tar.gz (or later version)
    Extract to any directory
    cd to httplib2 directory
    run the command "python setup.py install --user"

===================================================================================
###################################################################################

###################################################################################    
Execution instructions:

SuperFold can be run using one command:
python SuperFold.py RNA.map

All the other flags are optional. Use the --help flag for explainations of command line options
python SuperFold.py --help


File Setup:

The only required file is a .map file. This output is automatically
generated by the ShapeMapper pipeline. The .map file consistes of 
the nucleotide #, SHAPE reactivity, Error, and Nucleotide sequence.
T nucleotides will automatically be converted to U by SuperFold.

---myFavoriteRNA.map---
1	0.002512	0.053798	G
2	-0.034906	0.143529	T
3	-0.077852	0.257623	T
4	-0.068123	0.122385	T


Differential SHAPEMap file:

The differential file consists of the nucleotide#, differntial SHAPE
reactivity, std error, nucleotide sequence and Z-factor of the difference
calculated by 1- 3(1m6_err + nmia_err)/abs(shape1-shape2). 
--myRNAnmia-1m6.mapd--
1	-999.0	-999.0	G	-999.0
2	-0.0124	0.2673	U	-74.2440186566
3	0.0951	0.0833	U	-2.34887508212
4	0.0409	0.0929	U	-7.96984706503

A differential SHAPEMap file is created by running the utility
differenceByWindowSHAPAEMAP.py. This program has the following usage:

Usage: <nmia.txt.map> <1m6.txt.map> <difference.dif.mapd> <i>

Create your .mapd file using the following command:
python differenceByWindowSHAPEMAP.py nmia.map 1m6.map nmia-1m6.mapd 25

where nmia.map and 1m6.map are the names of the NMIA and 1M6 map files. The new file
"nmia-1m6.mapd" will contain the differential map file suitable to be given to the 
--differentialFile flag of SuperFold.


Single Strand Constraints:

Include any other single stranded constraints that
you have other evidence shouldn't be considered for folding here. ex:

---ssConstraints.txt--- < this part is just the name, not in the file
34
35
36
78
77
76


PK constraints:

In a second file. List the PKs in pairs. We will use this paired PK file to
reassemble your pk'd nucleotides in the final step. ex:

---ListofPKs_ds.txt---
34 78
35 77
36 76

###################################################################################
Output description and troubleshooting:

Occasionally (depending on the RNA and SHAPE constraints) it may be required to use a smaller window size
for partition and for Fold in order to obtain base pairs in the output. This can be accomplished with the:
--partitionWindowSize
--foldWindowSize

1000 is a good size to select for the partition window. For window sizes less than 1000 set --trimInterior
to 200 nucleotides in order to obtain an output for interior windows. Smaller window sizes will result in
a bias toward shorter range interactions.


Outputs are listed in the order of execution:
Folders are created by superfold automatically to store the output. In order to prevent a collision with file names
a cryptographic hash of the input values is appended to the folder and file names. A log file detailing the run is 
in the results folder. 

Intermediate partition function calculatoins are in the partition folder. Intermediate fold calcualtions are in the 
fold folder.

Merged partition function and minimum free energy structures are in the results folder and begin with the title 
merged.

Likely base pairs from partition function are plotted as arc in the arcs file. The following is the key:
green > 80%
blue > 30%
yellow > 10%
gray > 3%

The Shannon entropy and SHAPE analysis is plotted in the ShannonSHAPE pdf file. Region cutsites are written to the log file.

Indvidual region structure files and plots are written to the regions folder with the region range in the filename