lmdu/pyfastx

Making invalid slices return an empty string instead of None

Closed this issue · 1 comments

Hi, and thank you for the great library!

I ran into an subtle issue with pyfastx's sequence slicing where I think the current behavior could be improved. Ordinarily, slicing a Sequence object returns another Sequence object that can be converted to a str:

test.fasta

>seq
ACGTACAT

Python code

>>> fa = pyfastx.Fasta("test.fasta")
>>> fa
<Fasta> test.fasta contains 1 sequences
>>> fa["seq"]
<Sequence> seq with length of 8
>>> str(fa["seq"])
'ACGTACAT'
>>> fa["seq"][3:6]
<Sequence> seq from 4 to 6
>>> str(fa["seq"][3:6])
'TAC'

This all works as expected.

However, when the slice indices result in a "zero-length sequence," the operation returns None rather than returning an empty string (''):

>>> # For example, when the start and end indices of the slice are equal
>>> fa["seq"][3:3]
>>> fa["seq"][3:3] is None
True
>>> str(fa["seq"][3:3])
'None'
>>> # Or when both indices are outside of the range of the sequence's length
>>> fa["seq"][100:105] is None
True
>>> str(fa["seq"][100:105])
'None'

I believe that this behavior is problematic—if a user calls str() on the return value of a slice operation, then they might quietly get the string 'None' rather than an empty string (''). I think this may happen to many users because, in the context of Python string slicing, slices that would result in zero-length sequences return empty strings:

>>> strseq = "ACGTACAT"
>>> strseq[3:3]
''
>>> strseq[100:105]
''

In my use case, I was trimming the first and last k nucleotides from a string (with the understanding that, if the sequence's length was < 2k + 1, the returned string would be empty). I was really confused why my code kept returning "None" :)

This isn't a very big issue (now that I know it's a possibility, I can handle it in my code properly), but I think returning an empty string in these cases would be much nicer than returning None.

Thank you again for your work on this library!

lmdu commented

Thank you for your nice suggestion. I will fix it in next version.