OrderCpGsByLocation with ignore.strand = TRUE
Opened this issue · 2 comments
Hi,
I realized that some start positions are bigger than the end positions in the results from lmmTestAllRegions
. For example, one of my results:
chrom start end nCpGs Estimate StdErr Stat pValue FDR
1: chr18 10922 10862 4 -0.01854889 0.0220416 -0.8415399 0.4000455 1
Following the extract of EPIC.hg19.manifest
from sesameData for the 4 CpGs in region:
GRanges object with 4 ranges and 1 metadata column:
seqnames ranges strand | address_A
<Rle> <IRanges> <Rle> | <integer>
cg23708725 chr18 10922-10923 + | 56710277
cg23947066 chr18 10930-10931 + | 80737195
cg07138201 chr18 10935-10936 + | 36755955
cg00703566 chr18 10862-10863 - | 40648584
-------
seqinfo: 26 sequences from an unspecified genome; no seqlengths
After looking at the source code, it seems that you use the default settings of sort()
which order a GRanges object first by seqnames, then by strand, then by start, and finally by width.
coMethDMR/R/util1_OrderCpGsByLocation.R
Lines 54 to 56 in 34553b9
As the identified co-methylated CpGs are not always in the same strand, when you try to get the region name, using the default sort()
will lead to some regions having start greater than end.
Lines 99 to 103 in 34553b9
Will you consider to use sort(..., ignore.strand = TRUE)
to eliminate this problem?
I need to add this ignoreStrand
argument to any function in coMethDMR::
that calls OrderCpGsByLocation()
directly or indirectly. So far, we have only fixed: GetCpGsInRegion()
, lmmTest()
, lmmTestAllRegions()
, OrderCpGsByLocation()
, and WriteCloseByAllRegions()
.
I will wait until I finish all the work for https://github.com/TransBioInfoLab/coMethDMR/tree/update_sesame before I fix this.