Reduce memory footprint

Question

Reduce memory footprint

Closed this issue 4 years ago · 1 comments

Originally I was not able to process 20K+ sequences because my workstation ran out of memory while procssing pangolearn.py. There seem to be two memory intensive steps in this script:

loading and encoding the sequence data as "one-hot" vectors
generating a pandas data frame from these vectors

I am attempting to reduce the RAM consumed by this script by (1) filtering sequences to the required sites (indices) on load, and (2) processing subsets of the filtered data as pandas data frames of a fixed maximum size.

Answer 1 · 2020-07-27T20:26:05.000Z

Reading the entire Uk data set still requires too much memory (over 16G) even after filtering out unused sites.