lmdu/pyfastx

Possible memory leak in keys

sanjaysrikakulam opened this issue · 4 comments

Hi,

I have downloaded the trembl dataset from UniProtKB (48GiB) and created an index (23GiB).

from pyfastx import Fasta

# Create index
fobj = Fasta('uniprot_trembl.fasta.gz', build_index=True)

# Extract keys
sample_ids = fobj.keys()

# Apply length filter
sample_ids.filter(sample_ids >= 11)

# Iterate through the filtered sample ids (before the end of the iteration a total of 12.4 GiB of memory is seen to be consumed)
dummy_count = 0
for idx, key in enumerate(sample_ids, start=1):
    dummy_count += 1

The above iteration is shown here for reproducible purposes and to demonstrate that the memory is consumed here no matter what. Can you please take a look at this and offer a solution?

Thanks in advance!

P.S:
System info:
OS: CentOS 7
Python 3.7.7
pyfastx version: 0.8.3

lmdu commented

Thank you for report this issue. I have found this bug and will fix it in next few days

Great, thank you!

lmdu commented

We have fixed it in new version. Thanks!

Thank you! :-)