PoisonAlien/maftools

The question aboud the fisher.test in `trinucleotideMatrix`

Closed this issue · 2 comments

Hi, Thanks for the great package, I'm learning APOBEC enrichemnt, after studying line by line. I have some questions about the fisher.test here:

xf = fisher.test(matrix(c(x[2], sum(x[3], x[4]), x[1] - x[2], x[3]-x[4]), nrow = 2), alternative = 'g')

For some test, I use debug(trinucleotideMatrix) to step into function trinucleotideMatrix, after running in fisher.test, I checked the apobec.fisher.dat which is presented here:

Browse[2]> apobec.fisher.dat
      n_C>G_and_C>T tCw_to_G+tCw_to_T     C  tcw  wga
 [1,]           121                24  1481  161  164
 [2,]           997               601 10364 1694 1679
 [3,]           743               245  8373 1100 1038
 [4,]           149                67  1647  230  250
 [5,]            85                19   984  147  127
 [6,]           356                71  3941  451  526
 [7,]          1091               483 11777 1757 1692
 [8,]           306               172  3334  553  486
 [9,]          1869              1233 19255 3112 3265
[10,]           681               322  7163 1083 1133
[11,]           658               197  7326  929  924
[12,]           353               185  3786  568  612

fisher.test is implemented row-wise:

apply(
    X = apobec.fisher.dat,
    1, function(x) {
        xf <- fisher.test(matrix(c(
            x[2], sum(x[3], x[4]),
            x[1] - x[2], x[3] - x[4]
        ), nrow = 2), alternative = "g")
        data.table::data.table(
            fisher_pvalue = xf$p.value,
            or = xf$estimate, ci.up = xf$conf.int[1], ci.low = xf$conf.int[2]
        )
    }
)

the matrix used to implement fisher.test is like something:

c(
    `tCw_to_G+tCw_to_T`,
    C + tcw,  
    `n_C>G_and_C>T` - `tCw_to_G+tCw_to_T`,
    C - tcw
)

The second and fourth value is I'm concered; from the orginal article:
to an analogous ratio for all cytosines and guanines that reside inside and outside of the tCw/wGa motif within a sample fraction of the genome.
image

I think the matrix for fisher.test should be something (including both cytosines and guanines) like this:

c(
    `tCw_to_G+tCw_to_T`, tcw + wga,
    `n_C>G_and_C>T` - `tCw_to_G+tCw_to_T`, C + G - (tcw + wga)
)

Please correct me if I'm wrong

This issue is stale because it has been open for 60 days with no activity.

This issue was closed because it has been inactive for 14 days since being marked as stale.