kgori/sigfit

Incorrect labels in code documentation of human_trinuc_freqs

Closed this issue · 1 comments

gevro commented

Hi,
I think the labels in your documentation of human_trinuc_freqs is incorrect. The numbers are correct and in the correct order.

However, the code documentation on the right side is not. Based on manual hg19 trinucleotidefrequency calculation, I'm quite sure this should be the order of the labels in the documentation:

 [1] "ACA" "ACC" "ACG" "ACT" "CCA" "CCC" "CCG" "CCT" "GCA" "GCC" "GCG" "GCT" "TCA" "TCC" "TCG" "TCT" "ACA" "ACC" "ACG"
[20] "ACT" "CCA" "CCC" "CCG" "CCT" "GCA" "GCC" "GCG" "GCT" "TCA" "TCC" "TCG" "TCT" "ACA" "ACC" "ACG" "ACT" "CCA" "CCC"
[39] "CCG" "CCT" "GCA" "GCC" "GCG" "GCT" "TCA" "TCC" "TCG" "TCT" "ATA" "ATC" "ATG" "ATT" "CTA" "CTC" "CTG" "CTT" "GTA"
[58] "GTC" "GTG" "GTT" "TTA" "TTC" "TTG" "TTT" "ATA" "ATC" "ATG" "ATT" "CTA" "CTC" "CTG" "CTT" "GTA" "GTC" "GTG" "GTT"
[77] "TTA" "TTC" "TTG" "TTT" "ATA" "ATC" "ATG" "ATT" "CTA" "CTC" "CTG" "CTT" "GTA" "GTC" "GTG" "GTT" "TTA" "TTC" "TTG"
[96] "TTT"

i.e. these labels:

 [1] "ACA>AAA" "ACC>AAC" "ACG>AAG" "ACT>AAT" "CCA>CAA" "CCC>CAC" "CCG>CAG" "CCT>CAT" "GCA>GAA" "GCC>GAC" "GCG>GAG"
[12] "GCT>GAT" "TCA>TAA" "TCC>TAC" "TCG>TAG" "TCT>TAT" "ACA>AGA" "ACC>AGC" "ACG>AGG" "ACT>AGT" "CCA>CGA" "CCC>CGC"
[23] "CCG>CGG" "CCT>CGT" "GCA>GGA" "GCC>GGC" "GCG>GGG" "GCT>GGT" "TCA>TGA" "TCC>TGC" "TCG>TGG" "TCT>TGT" "ACA>ATA"
[34] "ACC>ATC" "ACG>ATG" "ACT>ATT" "CCA>CTA" "CCC>CTC" "CCG>CTG" "CCT>CTT" "GCA>GTA" "GCC>GTC" "GCG>GTG" "GCT>GTT"
[45] "TCA>TTA" "TCC>TTC" "TCG>TTG" "TCT>TTT" "ATA>AAA" "ATC>AAC" "ATG>AAG" "ATT>AAT" "CTA>CAA" "CTC>CAC" "CTG>CAG"
[56] "CTT>CAT" "GTA>GAA" "GTC>GAC" "GTG>GAG" "GTT>GAT" "TTA>TAA" "TTC>TAC" "TTG>TAG" "TTT>TAT" "ATA>ACA" "ATC>ACC"
[67] "ATG>ACG" "ATT>ACT" "CTA>CCA" "CTC>CCC" "CTG>CCG" "CTT>CCT" "GTA>GCA" "GTC>GCC" "GTG>GCG" "GTT>GCT" "TTA>TCA"
[78] "TTC>TCC" "TTG>TCG" "TTT>TCT" "ATA>AGA" "ATC>AGC" "ATG>AGG" "ATT>AGT" "CTA>CGA" "CTC>CGC" "CTG>CGG" "CTT>CGT"
[89] "GTA>GGA" "GTC>GGC" "GTG>GGG" "GTT>GGT" "TTA>TGA" "TTC>TGC" "TTG>TGG" "TTT>TGT"

whereas the documentation has this:

        # Human genome trinucleotide frequencies (from EMu)
        freq <- c(1.14e+08, 6.60e+07, 1.43e+07, 9.12e+07, # C>A @ AC[ACGT]
                  1.05e+08, 7.46e+07, 1.57e+07, 1.01e+08, # C>A @ CC[ACGT]
                  8.17e+07, 6.76e+07, 1.35e+07, 7.93e+07, # C>A @ GC[ACGT]
                  1.11e+08, 8.75e+07, 1.25e+07, 1.25e+08, # C>A @ TC[ACGT]
                  1.14e+08, 6.60e+07, 1.43e+07, 9.12e+07, # C>G @ AC[ACGT]
                  1.05e+08, 7.46e+07, 1.57e+07, 1.01e+08, # C>G @ CC[ACGT]
                  8.17e+07, 6.76e+07, 1.35e+07, 7.93e+07, # C>G @ GC[ACGT]
                  1.11e+08, 8.75e+07, 1.25e+07, 1.25e+08, # C>G @ TC[ACGT]
                  1.14e+08, 6.60e+07, 1.43e+07, 9.12e+07, # C>T @ AC[ACGT]
                  1.05e+08, 7.46e+07, 1.57e+07, 1.01e+08, # C>T @ CC[ACGT]
                  8.17e+07, 6.76e+07, 1.35e+07, 7.93e+07, # C>T @ GC[ACGT]
                  1.11e+08, 8.75e+07, 1.25e+07, 1.25e+08, # C>T @ TC[ACGT]
                  1.17e+08, 7.57e+07, 1.04e+08, 1.41e+08, # T>A @ AC[ACGT]
                  7.31e+07, 9.55e+07, 1.15e+08, 1.13e+08, # T>A @ CC[ACGT]
                  6.43e+07, 5.36e+07, 8.52e+07, 8.27e+07, # T>A @ GC[ACGT]
                  1.18e+08, 1.12e+08, 1.07e+08, 2.18e+08, # T>A @ TC[ACGT]
                  1.17e+08, 7.57e+07, 1.04e+08, 1.41e+08, # T>C @ AC[ACGT]
                  7.31e+07, 9.55e+07, 1.15e+08, 1.13e+08, # T>C @ CC[ACGT]
                  6.43e+07, 5.36e+07, 8.52e+07, 8.27e+07, # T>C @ GC[ACGT]
                  1.18e+08, 1.12e+08, 1.07e+08, 2.18e+08, # T>C @ TC[ACGT]
                  1.17e+08, 7.57e+07, 1.04e+08, 1.41e+08, # T>G @ AC[ACGT]
                  7.31e+07, 9.55e+07, 1.15e+08, 1.13e+08, # T>G @ AC[ACGT]
                  6.43e+07, 5.36e+07, 8.52e+07, 8.27e+07, # T>G @ AG[ACGT]
                  1.18e+08, 1.12e+08, 1.07e+08, 2.18e+08) # T>G @ AT[ACGT]
kgori commented

Hi gevro,

Well spotted, the latter half of the commented labels are wrong. They should follow the pattern AT*,CT*,GT*,TT*. We will fix this in the next release.

Thanks for the report,
Kevin