Invalid nucleotides for last chromosome
Opened this issue · 1 comments
pascalg commented
Using the following reference genome:
>FIRST_CHR
ACGTACGTACGTACGTACGTACGTACGTA
>LAST_CHR
AAAGGGGGGC
The entire sequence of "LAST_CHR" is correct:
>>> reader['LAST_CHR'][0:9]
'AAAGGGGGG'
But extracting nucleotides from the end of "LAST_CHR" returns wrong nucleotides:
>>> reader = twobitreader.TwoBitFile("genome.2bit")
>>> reader['LAST_CHR'][5:6]
'A'
>>> reader['LAST_CHR'][6:9]
'AAA'
benjschiller commented
I suspect this problem is occurring because the entry is too short. the normal UCSC chroms are padded with ~10000 Ns, so I may have made some assumptions based on that. As a short-term fix, I'd recommend adding some Ns (say 1000) to the front and seeing if that fixes the problem. I'll try to look at the code this week and figure out what's going wrong.