benjschiller/twobitreader

Invalid nucleotides for last chromosome

Opened this issue · 1 comments

Using the following reference genome:

>FIRST_CHR
ACGTACGTACGTACGTACGTACGTACGTA
>LAST_CHR
AAAGGGGGGC

The entire sequence of "LAST_CHR" is correct:

>>> reader['LAST_CHR'][0:9]
'AAAGGGGGG'

But extracting nucleotides from the end of "LAST_CHR" returns wrong nucleotides:

>>> reader = twobitreader.TwoBitFile("genome.2bit")
>>> reader['LAST_CHR'][5:6]
'A'
>>> reader['LAST_CHR'][6:9]
'AAA'

I suspect this problem is occurring because the entry is too short. the normal UCSC chroms are padded with ~10000 Ns, so I may have made some assumptions based on that. As a short-term fix, I'd recommend adding some Ns (say 1000) to the front and seeing if that fixes the problem. I'll try to look at the code this week and figure out what's going wrong.