Bioconductor/Biostrings

readAxt() and DNAStringSet() are unable to keep lowercase sequences

Abrar2652 opened this issue · 1 comments

readAxt() and DNAStringSet() functions automatically convert the lowercase (repetitive) sequences to the uppercase which produces wrong outcomes in research and many papers have already been published without knowing this internal fault of these functions.

I don't know what readAxt() is (Biostrings has no such function).

DNAStringSet() and DNAString() behave as intended and as documented, which is to return DNA sequences, that is, sequences made of letters from DNA_ALPHABET:

> DNA_ALPHABET
 [1] "A" "C" "G" "T" "M" "R" "W" "S" "Y" "K" "V" "H" "D" "B" "N" "-" "+" "."

No lowercase letters here.

Repetitive sequences in the Biostrings/BSgenome framework are handled via "masks". See the "Efficient genome searching with Biostrings and the BSgenome data packages" vignette in the BSgenome package for more information about masks and masked genome sequences.

H.