/simulations_for_topiccontml

This repository describes the tools needed to construct the simulated data used for the package TopicContml by Marzieh (Tara) Khodaei

Primary LanguagePythonMIT LicenseMIT

Simulations for the Topic modeling paper:
[the directory old contains an earlier coalescent-simulator, in a major revision we replaced that with the current one using the sofware DAWG, see below]

Simulation of sequences from a tree with gaps
using the software DAWG (1).

1. Download DAWG from https://github.com/CartwrightLab/dawg.git
2. Install dawg (follow the gihub instructions)
3. There are two sets of runX: run7 and run14, for recreating our table you will
   need to create the following directory:
   7tip-1000
   7tip-1000-moreindel
   14tip-1000
   14tip-1000-moreindel
4. For each directory:
   #a.
   cd 7tip-1000;
   cp ../*.py .;cp ../run .; cp ../7tiptree.tre; cp ../run7 .;
   cp ../dna-with-gaps.dawg .
   . run7
   cd ..
   #b.
   cd 7tip-1000-moreindel;
   cp ../*.py .;cp ../run .; cp ../7tiptree.tre; cp ../run7 .;
   cp ../dna-with-more-gaps.dawg dna-with-gaps.dawg
   . run7
   #c.
   cd 7tip-1000;
   cp ../*.py .;cp ../run .; cp ../14tiptree.tre; cp ../run14 .;
   cp ../dna-with-gaps.dawg .
   . run14
   cd ..
   #d.
   cd 7tip-1000-moreindel;
   cp ../*.py .;cp ../run .; cp ../14tiptree.tre; cp ../run14 .;
   cp ../dna-with-more-gaps.dawg dna-with-gaps.dawg
   . run14
   cd ..
5. Collect the information for the table   
   #a.
   cd 7tip-1000; . ./gettable > rawresults-7tip-1000
   ...
   #do this for every directory the rawresults now contain the values that we show in the table
   #the raw results are ordered bu loci 1,2,5,10,20,...1000
   # the table contain the highlighted parts for all 1..1000 loci.
   # example
   #                     +++                        +++++++++++
   # RF=2/10,RF<=2,4,6,8:0.5 0.9 1.0 1.0, mean(wRF)=1.130189611



[1] Reed A Cartwright, DNA assembly with gaps (Dawg):
      simulating sequence evolution, Bioinformatics 21, Suppl 3 (2005), iii31–8.