IBD_Kmer_Analysis

Python Implementaion - Thanks Rob!

The python kmerbias should function the same as the perl kmerbias, both depend on the output from GetNucFrequency_PerSeq_varK.pl (GCF_000403175.4mers is an example output file)

python3 kmerbias.py -f GCF_000403175.4mers -k GGCC,GCGC -v > pyout
perl kmerbias.pl GCF_000403175.4mers "GGCC,GCGC"  > plout

plout and pyout are not identical because of the floating point math, but they are close to several significant digits.

Perl Version

The code should be able to run the code like this:

perl GetNucFrequency_PerSeq_varK.pl ./GCF_000403175.fna 4 > GCF_000403175.4mers

and then you can count the bias:

perl kmerbias.pl GCF_000403175.4mers "GGCC,GGCG,GGGC"

The first output file is GCF_00403175.4mers

The second command should output:

Matrix NZ_KE159482.1 NZ_KE159483.1 NZ_KE159484.1 NZ_KE159485.1 NZ_KE159486.1 NZ_KE159487.1 NZ_KE159488.1 NZ_KE159489.1 NZ_KE159490.1 NZ_KE159491.1 NZ_KE159492.1
GGCC 0.72379167166026 0.564725231578111 1.06033886652903 0.700487472590855 0.722849965243236 0.744502198607545 0.718905768185197 0.627581109687921 0.638351277899619 0.907762632169645 0
GGCG 1.02371445359977 1.01361527445047 0.880629901562083 1.01214386205571 0.99189604408644 1.00392096757395 1.01391615871228 1.03649903331046 1.03819590211593 0.875969559120189 0.934879508443214
GGGC 1.09172008969991 1.11655103110103 0.956758304197849 1.10617995726091 1.08601140376983 1.09210234817395 1.10214747927039 1.15141594439126 1.21693684689459 1.09103255691336 1.22430819416776