README.md by Rohan Maddamsetti
See Table 4 of the Obelisk paper for details of SK36 data in that paper. I examined a couple of those datasets, and some additional SK36 monoculture datasets not in Table 4.
cd Research/active-research/obelisk-SK36/src
##I use pysradb to look up the Run Accession ID for prefetch.
pysradb metadata SRR1713039 prefetch SRR1713039 -O ../data cd ../data fasterq-dump --threads 10 SRR1713039 cd ../src
SRR14406732 https://trace.ncbi.nlm.nih.gov/Traces/?view=run_browser&acc=SRR14406732&display=metadata pysradb metadata SRR14406732
https://www.ncbi.nlm.nih.gov/bioproject/PRJNA862079
pysradb metadata SRX16651438 prefetch SRR20627698 -O ../data cd ../data fasterq-dump --threads 10 SRR20627698 cd ../src
pysradb metadata SRX16651439 prefetch SRR20627697 -O ../data cd ../data fasterq-dump --threads 10 SRR20627697 cd ../src
pysradb metadata SRX16651440 prefetch SRR20627696 -O ../data cd ../data fasterq-dump --threads 10 SRR20627696 cd ../src
https://www.ncbi.nlm.nih.gov/bioproject/PRJNA961761 pysradb metadata SRX20097554 prefetch SRR24302371 -O ../data cd ../data fasterq-dump --threads 10 SRR24302371 cd ../src
pysradb metadata SRX20097555 prefetch SRR24302370 -O ../data cd ../data fasterq-dump --threads 10 SRR24302370 cd ../src
pysradb metadata SRX20097556 prefetch SRR24302369 -O ../data cd ../data fasterq-dump --threads 10 SRR24302369 cd ../src
https://www.ncbi.nlm.nih.gov/bioproject/PRJNA937727
pysradb metadata SRX19476645 prefetch SRR23591557 -O ../data cd ../data fasterq-dump --threads 10 SRR23591557 cd ../src
pysradb metadata SRX19476647 prefetch SRR23591555 -O ../data cd ../data fasterq-dump --threads 10 SRR23591555 cd ../src
pysradb metadata SRX19476649 prefetch SRR23591553 -O ../data cd ../data fasterq-dump --threads 10 SRR23591553 cd ../src
pysradb metadata SRX19476651 prefetch SRR23591551 -O ../data cd ../data fasterq-dump --threads 10 SRR23591551 cd ../src
pysradb metadata SRX19476653 prefetch SRR23591549 -O ../data cd ../data fasterq-dump --threads 10 SRR23591549 cd ../src
pysradb metadata SRX19476655 prefetch SRR23591547 -O ../data cd ../data fasterq-dump --threads 10 SRR23591547 cd ../src
pysradb metadata SRX19476657 prefetch SRR23591545 -O ../data cd ../data fasterq-dump --threads 10 SRR23591545 cd ../src
pysradb metadata SRX19476659 prefetch SRR23591543 -O ../data cd ../data fasterq-dump --threads 10 SRR23591543 cd ../src
pysradb metadata SRX19476661 prefetch SRR23591541 -O ../data cd ../data fasterq-dump --threads 10 SRR23591541 cd ../src
pysradb metadata SRX19476663 prefetch SRR23591539 -O ../data cd ../data fasterq-dump --threads 10 SRR23591539 cd ../src
I found 3 polymorphisms in the Obelisk across RNAseq samples.
A) Position 546 R162R (CGA -> CGG) A->G mutation in Oblin-1 B) Position 194 I48I (ATC -> ATA) C->A mutation in Oblin-1 C) Position 1014 A->G mutation downstream of Oblin-1
Polymorphisms B+C have similar frequencies in all samples that have them-- they may be linked in a single haplotype.
By inspection using the FORNA web browser (http://rna.tbi.univie.ac.at/forna/forna.html), positions 194 and 1014 do not interact in the Obelisk structure. They are pretty far apart in the hairpin. Indeed, an explicit check shows that the free energy for the double mutant is additive (no epistasis).
In this analysis, I use RNAfold to calculate secondary structures and changes in minimum free energy of folding for:
- WT + mutation A
- WT + mutation B
- WT + mutation C
For (1,2,3) I compare the free energy changes to all possible single mutants from WT.
RNAfold is in the the ViennaRNA package, which was installed using conda: conda install bioconda::viennarna
See RNAfold documentation here to see how to extract ensemble free energy for a given Obelisk sequence: https://www.tbi.univie.ac.at/RNA/ViennaRNA/refman/tutorial/RNAfold.html#equilibrium-ensemble-properties
It takes about 10s to run RNAfold on one sequence on my laptop. Therefore, we need to use a job array on the Duke Compute Cluster.
cd to the src/ directory.
$ cd src/
then generate input files for RNAfold:
$ python generate-Obelisk-mutants.py
Now run RNAfold on the WT and observed mutants:
$ ./run-RNAfold-on-Obelisk-mutants.sh
And run a job array script to run RNAfold on each input file.
$ ./submit-Obelisk-single-mutant-RNAfold-jobarray.sh
Now tabulate the ensemble free energy for each single mutant.
$ python tabulate-Obelisk-single-mutant-free-energies.py
Now compare the free energy differences for the observed mutants to the distribution of all single-mutant free energy differences using the following R script:
$ analyze-Obelisk-single-mutant-free-energies.R
obelisk-abundance-model.jl is a Pluto notebook written in Julia 1.10. This notebook can be run by installing and running Pluto.jl within Julia 1.10+ (see instructions at: https://plutojl.org/) and then opening the notebook using the Pluto web browser interface.