RMI-PACTA/workflow.data.preparation

Explore more appropriate memory management that doesn't involve calling `gc()`

cjyetman opened this issue · 4 comments

          Approving as a "fine if we HAVE to" approval. And on the condition that we open a new issue to further explore a better way to get memory use within a reasonable limit. 

There has to be a better way! But agree that this is an effective band-aid for now.

Originally posted by @jdhoffa in #140 (review)

for the record, @cjyetman does not believe that using gc() is necessarily a bad thing to do

Maybe not!

My reaction to it may just be because I am not used to it or seeing that done in PROD elsewhere, not strictly a reaction to it fundamentally

My reaction to it is that it's a pretty clear code smell. If you're running close enough to the edge that auto-collection isn't cutting it, then it's probably time to explore other options (this ticket!)

I think the idea here is that use of gc() implies a misuse of memory, e.g. making numerous, unnecessary large objects that need to be managed. The reality is that we have numerous large objects that need to be loaded, and we've already put a lot of effort into only loading them when needed, and getting rid of them once they're no longer needed, and now running gc() immediately after they've been removed to force R to clean up after them immediately, which imo is proper usage. Closing this ticket.