lmdu/pyfastx

Full Fasta info object without index building

oschwengers opened this issue · 1 comments

Hi and thanks a lot for this super-fast python library.
We'd like to use this in our tools like Bakta and Platon. Maybe I've overlooked something, but we need a way to parse FASTA files in the fastest possible way, i.e. w/o building an index, but with access to the sequence ID, description and sequence.

So, due to the readme there is:

import pyfastx
for name, seq in pyfastx.Fasta('test.fa.gz', build_index=False):
    print(name, seq)

and:

import pyfastx
for seq in pyfastx.Fasta('test.fa.gz'):
    print(seq.name)
    print(seq.seq)
    print(seq.description)

But what we actually need is:

import pyfastx
for seq in pyfastx.Fasta('test.fa.gz', build_index=False):
    print(seq.name)
    print(seq.seq)
    print(seq.description)

Also, it would be best if the description would already exclude the FASTA id. I think this usecase would be interesting for many other users, as well.

Thanks again and best regards!

lmdu commented

Thank you for your suggestion. I will consider adding this feature in next major version.