cozygene/TCA

Negative estimated cell-type-specific methylation value

Closed this issue · 4 comments

Hi,

When I run TCA to get the cell-type-specific methylation value, I found some samples get negative beta value. Also, some samples get beta value above 1. My code is as following:

health_p1.tca.mdl <- tca(X = health_p1_myComb, W = health_p1_W, 
                                       C1 = health_p1_C1, parallel = TRUE, 
                                       num_cores=20, log_file = "health_p1.tca.log")

health_p1.ref <- refactor(health_p1_myComb, k = 6, sparsity = 500, 
                          C = health_p1_C1, C.remove = TRUE, 
                          rand_svd = TRUE, log_file = "health_p1.refactor.log")

health_p1.ref.scores <- health_p1.ref$scores
health_p1.C3 <- cbind(health_p1_C1,health_p1.ref.scores)

health_p1.res <- tcareg(X = health_p1_myComb, tca.mdl = health_p1.tca.mdl, y = 
                           as.numeric(health_p1_df$nmf_cluster), C3 = health_p1.C3,
                           test = "marginal",parallel = TRUE, num_cores= 20,
                           save_results = TRUE, output = "health_p1.tcareg",
                           log_file = "health_p1.tcareg.log",
                           features_metadata = "../../../01.tool/HumanMethylationSites.txt")

# get the estimate mono specific methylation of the two assciated sites
health_p1.mono.hit <- rownames(health_p1_myComb)[order(health_p1.res[[3]]$pvals)[1:2]]
health_p1.mono.mdl.tca.sub <- tcasub(health_p1.tca.mdl, features = c(health_p1.mono.hit), 
                                                   log_file = "health_p1.mono.tcasub.log")
health_p1.mono.Z_hat <- tensor(X = health_p1_myComb[health_p1.mono.hit,], 
                  health_p1.mono.mdl.tca.sub, log_file = "health_p1.mono.tensor.log")

Thanks in advance.

E-R commented

The tensor function does not restrict the estimated values to be in the range [0,1] (it uses normal distributions so values are not bounded). In that sense, those values don't reflect what you would have expected from beta methylation values, however, we do see evidence that they correlate with cell-type-specific beta levels.

Thanks for your explanation

Hi,

I am confused when I compare the 'Z_hat' estimated methylation level between my control and case set. Because the selected CpG site had a qvalue less than 0.05 and pvalue close to 5e-9 in the tcareg result, whereas the 'Z_hat' value comparison gave a pvalue close to 0.5. Is this means in order to get the 5e-9 pvalue, the covariates such as age and gender should be included, otherwise the pvalue will not be significant?

E-R commented

It is hard to comment on your specific case without looking into the details, but in general, much like in linear regression, excluding important covariates may obscure signal (i.e. we'd get high instead of low p-vallues). Of note, either excluding important covariates or including unnecessary covariates may lead to spurious signals (again, much like in regression; i.e. low instead of high p-values).

Try including relevant covariates in your analysis. If this doesn't work then please share your code and some more information that would allow debugging your case more easily.