This is an implementation of a pipeline for sequence design given a target structure with simple pseudoknots, without explicitly modelling pseudoknots in the design process.
Besides packages installable via pip
, the following is required:
ViennaRNA
(Python bindings andRNAPKplex
)pKiss
RNAblueprint
- the proof-of-concept implementation
libbpdist
Although the commandline options of the scripts are documented and accessible via -h
, usage is not always ergonomic.
The design pipeline is implemented in azoarcus_design.py
(and pipeline.py
).
Per default, its output is in FASTA
format printed to stdout
.
To compute properties of designed sequences, use -i
, specify reference data via -S
, -T
.
This is the default design approach. Example usage:
python3 azoarcus_design.py -n 10 -j 4 -s 0.05 -T <target structure> -C <sequence constraints> > sequencedesigns.fasta
python3 azoarcus_design.py -j 4 -s -i sequencedesigns.fasta -T <target structure> -S <reference sequence> > sequencedesigns.csv
-s 0.05
adjusts the stopping threshold for the objective function.
Structural constraints are extracted from pseudoknots in the target structure; those should be reflected in sequence constraints specified via -C
.
Example input data:
GUGCCUUGCGCCGGGAAACCACGCAAGGGAUGGUGUCAAAUUCGGCGAAACCUAAGCGCCCGCCCGGGCGUAUGGCAACGCCGAGCCAAGCUUCGGCGCCUGCGCCGAUGAAGGUGUAGAGACUAGACGGCACCCACCUAAGGCAAACGCUAUGGUGAAGGCAUAGUCCAGGGAGUGGCGAAAGUCACACAAACCGG # reference sequence
...(((((((..((....)).)))))))...((((((....((((((...((...((((((....))))))..))...))))))(((...(.((((((....)))))).)..)))...[.[[[[[...))))))((((...(((....)))..))))......]]]]]]..((.(((((....))))).....)).. # target structure
GUGNCNNNNNNNNNGAAANNNNNNNNGNNANNNNNNCNAAUNCGNCNNNNNCUAAGNNNNNNNNNNNNNNUAUGNNNNNGNCGNNCCANNNNNNNNNNNNNNNNNNNNNNNNGGNGUAGAGACUANNNGNNNNNNNNCUAAGNNNNNNNNUAUGNNNNNNNCAUAGUCCNNNNNNNNNNGAAANNNNNNNNNNNNNG # complete sequence constraints
GUGNNNNNNNNNNNGAAANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGAGACUANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNUAGUCCNNNNNNNNNNNNNNNNNNNNNNNNNNNG # minimal sequence constraints
The alternative approach does not require structural constraints but two target structures obtained by splitting a pseudoknotted structure.
This variant of the pipeline is toggled using -P
.
Example usage:
python3 azoarcus_design.py -P -n 10 -j 4 -s 0.15 -T <first split structure> -T <second split structure> -C <sequence constraints> > sequencedesigns.fasta
python3 azoarcus_design.py -P -j 4 -s -i sequencedesigns.fasta -T <original target structure> -T <first split structure> -T <second split structure> -S <reference sequence> > sequencedesigns.csv
-s 0.15
is a higher threshold than before (the reason is illustrated in maxned_threshold.py
).
Structural constraints are extracted from pseudoknots in the target structure; those should be reflected in sequence constraints specified via -C
.
Example input data:
GUGCCUUGCGCCGGGAAACCACGCAAGGGAUGGUGUCAAAUUCGGCGAAACCUAAGCGCCCGCCCGGGCGUAUGGCAACGCCGAGCCAAGCUUCGGCGCCUGCGCCGAUGAAGGUGUAGAGACUAGACGGCACCCACCUAAGGCAAACGCUAUGGUGAAGGCAUAGUCCAGGGAGUGGCGAAAGUCACACAAACCGG # reference sequence
...(((((((..((....)).)))))))...((((((....((((((...((...((((((....))))))..))...))))))(((...(.((((((....)))))).)..)))...[.[[[[[...))))))((((...(((....)))..))))......]]]]]]..((.(((((....))))).....)).. # target structure
...(((((((..((....)).)))))))...((((((....((((((...((...((((((....))))))..))...))))))(((...(.((((((....)))))).)..))).............))))))((((...(((....)))..))))..............((.(((((....))))).....)).. # first split structure
...(((((((..((....)).))))))).............((((((...((...((((((....))))))..))...))))))(((...(.((((((....)))))).)..)))...(.(((((.........((((...(((....)))..))))......))))))..((.(((((....))))).....)).. # second split structure
GUGNCNNNNNNNNNGAAANNNNNNNNGNNANNNNNNCNAAUNCGNCNNNNNCUAAGNNNNNNNNNNNNNNUAUGNNNNNGNCGNNCCANNNNNNNNNNNNNNNNNNNNNNNNGGNGUANNNNNNNNNNGNNNNNNNNCUAAGNNNNNNNNUAUGNNNNNNNCANNNNNNNNNNNNNNNNGAAANNNNNNNNNNNNNG # complete-alt sequence constraints
GUGNNNNNNNNNNNGAAANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNG # minimal-alt sequence constraints
neutralpaths.py
computes neutral path lengths using pKiss
structure prediction.
Although not required for neutral paths, the option-T
computes the expected Hamming distance between
two random sequences compatible to a shared structure, by both uniform sampling and an estimation using the number of paired and unpaired positions.
eparams.py
may be used to assess different energy parameter sets (in format compatible to ViennaRNA
) using RNAfold
, pKiss
and RNAPKplex
.
Note that the parameter files are not provided here but are available in ViennaRNA
and here (requires conversion using RNAparconv
).
For this, pKiss
was used at version 2.2.12
and RNAPKplex
was patched (as of ViennaRNA 2.4.18
, RNAPKplex
seems to work better).
dotplot.py
was used to produce dotplots of base pair probability matrices and single secondary structures.