Making invalid slices return an empty string instead of None
Closed this issue · 1 comments
Hi, and thank you for the great library!
I ran into an subtle issue with pyfastx's sequence slicing where I think the current behavior could be improved. Ordinarily, slicing a Sequence
object returns another Sequence
object that can be converted to a str
:
test.fasta
>seq
ACGTACAT
Python code
>>> fa = pyfastx.Fasta("test.fasta")
>>> fa
<Fasta> test.fasta contains 1 sequences
>>> fa["seq"]
<Sequence> seq with length of 8
>>> str(fa["seq"])
'ACGTACAT'
>>> fa["seq"][3:6]
<Sequence> seq from 4 to 6
>>> str(fa["seq"][3:6])
'TAC'
This all works as expected.
However, when the slice indices result in a "zero-length sequence," the operation returns None
rather than returning an empty string (''
):
>>> # For example, when the start and end indices of the slice are equal
>>> fa["seq"][3:3]
>>> fa["seq"][3:3] is None
True
>>> str(fa["seq"][3:3])
'None'
>>> # Or when both indices are outside of the range of the sequence's length
>>> fa["seq"][100:105] is None
True
>>> str(fa["seq"][100:105])
'None'
I believe that this behavior is problematic—if a user calls str()
on the return value of a slice operation, then they might quietly get the string 'None'
rather than an empty string (''
). I think this may happen to many users because, in the context of Python string slicing, slices that would result in zero-length sequences return empty strings:
>>> strseq = "ACGTACAT"
>>> strseq[3:3]
''
>>> strseq[100:105]
''
In my use case, I was trimming the first and last k nucleotides from a string (with the understanding that, if the sequence's length was < 2k + 1, the returned string would be empty). I was really confused why my code kept returning "None"
:)
This isn't a very big issue (now that I know it's a possibility, I can handle it in my code properly), but I think returning an empty string in these cases would be much nicer than returning None
.
Thank you again for your work on this library!
Thank you for your nice suggestion. I will fix it in next version.