Memory leak in NN ensemble backend
juhoinkinen opened this issue · 0 comments
The Annif pod in the OpenShift environment has been occasionally killed, maybe once in every two weeks. The reason has seemed to be that the memory consumption has reached the limit for the pod (30 GB).
I monitored the memory consumption (RssAnon from /proc/$PID/status
) of locally run Annif when running suggest requests to NN ensemble project and its base projects with curl (using fulltext documents from JYU test set; the memory consumption was recorded after every 10 documents), but only in the case of NN ensemble there was a strong increase in the memory consumption: see below for a run for the yso-fi model of Finto AI.
I confirmed that the issue could be fixed by following the advice from one relevant discussion, i.e. to use __call__()
of the model:
Annif/annif/backend/nn_ensemble.py
Line 141 in 73d4f2e
The other mentioned fix, that is applying tf.convert_to_tensor()
, did not fix the memory leak; running (also) gc.collect()
after each prediction did fix it, but the predictions become very slow (10 requests took ~110 s, when without gc only ~30 s).
However, the NN ensemble could be modified to allow batch processing of the documents, and for that use the Keras documentation seems to suggest using the predict()
function, so I'm not sure if the mentioned fix is the best way to go.