This is a sub project of altorf, analyzing altorf from genome perspective. It creates simulated genomes for each species and compares altorf properties with real genome. Programs are organized into further sub-directories depending on the stage of analysis. Basically written in C++ or Python3, and also Shell Scripts are used to create pipelines.


Follow this flow from top to bottom. For further information, please refer to README on each sub directories.


  • Chooses which species in RefSeq to analyze.


  • Splits downloaded .fna files and fills ambiguous bases randomly if needed.


  • Creates simulated genomes according to .fna & .gff files.


  • Analyze pattern of altorf comparing real genome and simulated genome.


I chose SeqAn as C++ library in order to process DNA sequence efficiently and to make biological fileI/0 easier.

Compilation of .cpp

  • Commands below creates binary file in build directory.
mkdir build
cd build
cmake ../src
  • In case you obtained SeqAn from a git clone, you need to specify the install location and include path.
    • cmake ../src -DCMAKE_PREFIX_PATH="$HOME/software/seqan/util/cmake" -DSEQAN_INCLUDE_PATH="$HOME/software/seqan/include"
      • Please change paths according to your environment.