Raw mean methylation difference vs smoothed scaled pooled proportion difference
Closed this issue ยท 5 comments
Hi Keegan,
I've been using dmrseq
for quite a while now, great package!
I had a question related to differences between raw mean methylation values and the smoothed scaled pooled proportion (SSPP) differences. Just recently I've been analyzing some DMRs a bit closer because I'm trying to narrow down some whole genome analyses. All these DMRs were detected as significant. Now, I always assumed that the 0.1 threshold used for SSPP to detect DMRs more or less matched the raw mean methylation difference, but after checking this in practice, I find that a lot of (significant) DMRs of mine have raw differences that are quite low.
Here's an example of a significant DMR for which I found a raw methylation difference of 0.000497 (extremely low):
I'm aware that raw mean methylation is indeed raw and doesn't take into account many factors, but I was wondering what could drive such a big gap between those two values? On the plot above it seems that SSPP is somewhere around 0.1-0.15, which is orders of magnitude higher.
Best,
Stefan
Hi Stefan,
I want to be sure I understand your question. Can you tell me how you computed the value 0.000497 for the above DMR?
Best,
Keegan
Sorry, in short I used the meanDiff
function from dmrseq
. Here is a snippet of my code:
# Use Bismark cov files
cov_files <- c(cov_1_diploid,
cov_2_diploid,
cov_3_diploid,
cov_1_polyploid,
cov_2_polyploid,
cov_3_polyploid)
# Load Rdata file with "regions" object from dmrseq
load(Rdata)
# import dmrseq output
dmrseq <- fread(dmrseq_output)
# import Bismark cov files
bismarkBSseq <- read.bismark(files = c(cov_files),
rmZeroCov = TRUE,
strandCollapse = FALSE,
verbose = TRUE)
sampleNames <- c(rep("par", 3),
rep("kam", 3))
pData(bismarkBSseq)$Species <- sampleNames
# Filtering step
loci.idx <- which(DelayedMatrixStats::rowSums2(getCoverage(bismarkBSseq, type="Cov") == 0) == 0)
sample.idx <- which(pData(bismarkBSseq)$Species %in% c("par", "kam"))
bs.filtered <- bismarkBSseq[loci.idx, sample.idx]
sigRegions <- regions[regions@elementMetadata@listData$qval < 0.05,]
# Extracting raw mean methylation differences
rawDiff <- meanDiff(bs.filtered, dmrs=sigRegions, testCovariate="Species")
# Plot first DMR
plotDMRs(bs.filtered, regions=sigRegions[1,], testCovariate="Species")
Thanks for the additional details.
Is the bs.filtered
object absolutely identical to the one that was used in the call to dmrseq
(not shown here) to generate the loaded results file dmrseq_output
? If it is not identical, then the indexes will not match up, and you might not be associating the regions to the same basepair positions.
I suspect this may be the issue, since I agree that the raw methylation differences from meanDiff
should match up more closely with the smoothed difference estimates. To double check, I would recommend only doing the filtering step once, then running dmrseq
and meanDiff
on the same bs.filtered
object.
Hope that helps.
Best,
Keegan
That's an excellent catch and it might indeed explain this difference. I realized that my bs.filtered
object comes from cov
files including CG, CHG and CHH methylation, while the dmrseq_output
object only looks at CG methylation. I'll repeat my analyses once more to see if the raw methylation difference seem more reasonable.
I'll update this issue with my corrected result and if it makes sense, I'll close the issue.
Thank you very much for the help!