Memory problems
Opened this issue · 4 comments
dears:
when I merge 200 files, memory consumption is too high. anyway to solve it?
@Link-Go Would you mind to share your solution if you finally solved this issue? Thanks in advance
@abubelinha (self-mention to find this when searching issues)
@abubelinha and anyone else who is facing the same issue:
tl;dr - Seems like there is a reference being held that prevents the docx being appended/inserted from being garbage collected after the append/insert
. Workaround that we've employed:
import gc
import docx
from docxcomposer import composer
...
def merge(composer, doc_path):
...
doc_to_merge = docx.Document(doc_path)
composer.append(doc_to_merge)
# XXX at this point doc_to_merge will not be gc()'d automatically
# when it goes out of scope
...
def merge_all(document_paths):
...
merged_doc = docx.Document()
composer = compose.Composer(merged_doc)
for document_path in document_paths:
merge(composer, document_path)
gc.collect() # this ensures doc_to_merge is gc()'d
I did some memory profiling (using memory-profiler ) and noticed that although the append
/insert
methods do not directly 'add' to the memory footprint, the docx object being appended/inserted does not get garbage collected after the call to the methods. Furthermore, it might seem like there is perhaps a circular self-reference somehow being maintained that prevents this gc from occurring.
Our workaround to this is to invoke gc.collect()
, soon after we've called .append()
. This fixes the problem for now. I might dig a bit deeper and see whether I can fix the underlying issue of the references being held and update this ticket if I manage to isolate it.
It would be helpful to know if I'm on the right track here and the workaround works for others as well.
Sorry but I am just new to this package and had actually not tried to implement anything.
Just wondering how to do it in case of receiving a @Link-Go answer.
But I do not fully understand your example, as your functions don't return anything.
Could you share a full script I can just run, so I can tell you if it works for me as well?
Thanks!