PoonLab/covizu

Enhance apply_features function for more efficiency

SandeepThokala opened this issue · 7 comments

result = list(refseq) # strings are not mutable

The overall memory usage of the list will be higher compared to just storing the characters in a continuous string. The difference in memory usage could be significant, especially when we have a large number of single-character strings stored in a list.

Is this function covered by the unit test suite? Please feel free to optimize but make sure that the results are the same.

Regarding commit 1b714b3, I think it is cleaner to generate a new string rather than re-use the old variable name and replace the previous string @SandeepThokala

@SandeepThokala can you please post some timing results (with old and new code) on processing either the unit test, or a large set of data in case the test fixture is processed to quickly for meaningful timing results?

@GopiGugan to provide @SandeepThokala with a larger dataset to generate the timing results

@SandeepThokala can you please report timing and RAM results here?

Using sys.getsizeof() function to get memory occupied by the result object

len(refseq) = 29903 len(refseq) = 100000
old code new code old code new code
time taken 0.005 secs 0.001 secs 0.001 secs 0.01 secs
memory 239288 bytes 29952 bytes 8000056 bytes 1000049 bytes

Thanks @SandeepThokala, go ahead and merge your changes into the dev branch please