
Command-line oligonucleotide frequency analysis tool

Primary LanguagePython


Command-line program for intragenomic oligonucleotide frequency analysis



The program splits the nucleotide sequence stored in fasta file to windows of specified lenghts(-f argument) and count the specified oligonucleotide (-n argument: dinucleotide, trinucleotide, tetranucleotide...) occurances of each window. From the windows' counts, specified statistical scores are calculated(-m argument: z-score, zero'th order Markov model, relative oligonucleotide frequncies), which are used to calculate the matrix of correlations between all windows. Windows can also be generated as sliding windows(-s argument), meaning two adjacent windows will have some overlapping sequence. Program can create windows of varius lenghts in one execution using arguments --maxlen and --minlen. Using --autocorr argument will calculate the correlations between statistical scores of windows and the whole genome.


Program compares the oligonucleotide composition of sequences in fasta files(located in current working directory or another directory). The oligonucleotide composition of different sequences is converted into chosen statistical score(-m argument: z-score, zero'th order Markov model, relative oligonucleotide frequncies) which are used for calculation of correlations between them. Using -n argument, the user can choose the lenght of oligonucleotide words that are counted.


The programs can be installed for easier acces in Linux, however they can be ran on other operating systems (including Linux) without installation. Without installation programs must be ran as any python script: python3 intetra.py -i <inputfile.fna> -f 5000 -n 2 -m zom


Download ZIP file and extract it anywhere. Open terminal in the directory which was created and run these commands:

chmod +x intetra.py
cp intetra.py ~/.local/bin/intetra
cp programi_args ~/.local/bin/programi_args -r
chmod +x coligo.py
cp coligo.py ~/.local/bin/coligo

After the installation the coligo.py and intetra.py scripts should be executable from any directory using commands "intetra" and "coligo".


intetra -i <inputfile.fna> -f 5000 -n 2 -m zom

intetra -i <inputfile.fna> -o <outputdirectory> -f 5000 -s 0.5 -n 2 4 -m zom --autocoor

intetra -i <inputfile.fna> -o <outputdirectory> -f 3000 -s 2000 -n 6 -m zscr zom --maxlen 300000 --minlen 30000 --autocoor --blockfasta

coligo -i <inputdirectory> -o <outputfile> -n 4 5 -m zom zscr -t upgma


python3.6 biopython numpy pandas matplotlib