Possible memory leak in keys

Question

Possible memory leak in keys

sanjaysrikakulam opened this issue 4 years ago · 4 comments

Hi,

I have downloaded the trembl dataset from UniProtKB (48GiB) and created an index (23GiB).

from pyfastx import Fasta

# Create index
fobj = Fasta('uniprot_trembl.fasta.gz', build_index=True)

# Extract keys
sample_ids = fobj.keys()

# Apply length filter
sample_ids.filter(sample_ids >= 11)

# Iterate through the filtered sample ids (before the end of the iteration a total of 12.4 GiB of memory is seen to be consumed)
dummy_count = 0
for idx, key in enumerate(sample_ids, start=1):
    dummy_count += 1

The above iteration is shown here for reproducible purposes and to demonstrate that the memory is consumed here no matter what. Can you please take a look at this and offer a solution?

Thanks in advance!

P.S:
System info:
OS: CentOS 7
Python 3.7.7
pyfastx version: 0.8.3

Answer 1 · 2021-06-23T10:26:12.000Z

Thank you for report this issue. I have found this bug and will fix it in next few days

Answer 2 · 2021-06-23T11:27:36.000Z

Great, thank you!

Answer 3 · 2021-06-30T13:01:06.000Z

We have fixed it in new version. Thanks!

Answer 4 · 2021-06-30T13:02:34.000Z

Thank you! :-)