/DNA-methylation-analysis

DNA methylation analysis notes from Ming Tang

DNA-methylation-analysis

papers

Illumina is phasing out 450k and introducing the new 850k as in the end of 2015

450k/850k array analysis

The Beta-value method has a direct biological interpretation - it corresponds roughly to the percentage of a site that is methylated. This makes the Beta-value very attractive when modeling the underlying biological effect. However, this interpretation is an approximation [22], especially when the data has not been properly preprocessed and normalized. From an analytical and statistical standpoint, the Beta-value method has severe heteroscedasticity outside the middle methylation range, which imposes serious challenges in applying many statistic models. In comparison, the M-value method is more statistically valid in differential and other statistic analysis as it is approximately homoscedastic. Although the M-value statistic does not have an intuitive biological meaning, it is possible to provide an accurate estimation of methylation status by modeling the distribution of the M-value statistic. In differential methylation analysis, we recommend using M-value because we can directly apply most statistical analysis methods designed for expression microarrays and it is easy to implement a difference threshold adjustment to improve the TPR. And the difference of M-value can be interpreted as the fold-change in the non-log scale. Although both Beta-value and M-value methods have some limitations, the two statistics are inter-convertible using Equation 3, enabling the use of the most appropriate method. We recommend using the M-value method for differential methylation analysis and also including the Beta-value statistic in final reports due to its intuitive biological interpretation.

The results of simulations suggest that the hierarchical Ward–Manhattan approach provides a consistent approach and that the Manhattan distance appears to be the best metric to separate clusters based on beta-values. However, this result is not absolute with some conditions particularly under low decisive data conditions resulting in inconsistency.

RRBS

Whole genome BS-seq

Nanopore

Batch effect

  • BEclear: Batch Effect Detection and Adjustment in DNA Methylation Data

Clustering

Motifs

Databases

Visualization