awslabs/multi-model-server

memory utilization increment after every request, worker died, memory issue

n0thing233 opened this issue · 1 comments

Hi, MMS by default will print memory utilization into log which is great. The problem I have is after each request to MMS, the memory utilization increment a little bit. after several requests, the memory utilization went up to 100% and worker died.
I don't think this is the right behavior right?
I tried gc.collect() in _handle function but it doesn't work.(no gpu available in this machine)
I wonder if anyone can help me out here.
here is an example:
when just started the server, the log shows:
2021-10-31 18:22:25,881 [INFO ] pool-2-thread-1 MMS_METRICS - MemoryUtilization.Percent:5.1|#Level:Host|#hostname:cebbb237ccfc,timestamp:1635704545
After one request:
mms_1 | 2021-10-31 18:24:25,742 [INFO ] pool-2-thread-1 MMS_METRICS - MemoryUtilization.Percent:26.2|#Level:Host|#hostname:cebbb237ccfc,timestamp:1635704665
After second request:
mms_1 | 2021-10-31 18:26:25,601 [INFO ] pool-2-thread-1 MMS_METRICS - MemoryUtilization.Percent:39.7|#Level:Host|#hostname:cebbb237ccfc,timestamp:1635704785
After third request:
mms_1 | 2021-10-31 18:30:25,323 [INFO ] pool-2-thread-1 MMS_METRICS - MemoryUtilization.Percent:58.5|#Level:Host|#hostname:cebbb237ccfc,timestamp:1635705025
After 4th request:
mms_1 | 2021-10-31 18:32:25,187 [INFO ] pool-2-thread-1 MMS_METRICS - MemoryUtilization.Percent:81.6|#Level:Host|#hostname:cebbb237ccfc,timestamp:1635705145
After 5th request,OOM appears:
mms_1 | 2021-10-31 18:35:41,402 [INFO ] epollEventLoopGroup-4-7 com.amazonaws.ml.mms.wlm.WorkerThread - 9000-96795301 Worker disconnected. WORKER_MODEL_LOADED mms_1 | 2021-10-31 18:35:41,528 [DEBUG] W-9000-video_segmentation_v1 com.amazonaws.ml.mms.wlm.WorkerThread - Backend worker monitoring thread interrupted or backend worker process died. mms_1 | java.lang.InterruptedException mms_1 | at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014) mms_1 | at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2088) mms_1 | at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:418) mms_1 | at com.amazonaws.ml.mms.wlm.WorkerThread.runWorker(WorkerThread.java:148) mms_1 | at com.amazonaws.ml.mms.wlm.WorkerThread.run(WorkerThread.java:211) mms_1 | at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) mms_1 | at java.util.concurrent.FutureTask.run(FutureTask.java:266) mms_1 | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) mms_1 | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) mms_1 | at java.lang.Thread.run(Thread.java:748)

Commenting to follow - at first I suspected this was involved with #942 , but I tested with that PR and saw no change in behavior compared to the current released version (1.1.4). @n0thing233 - Are you doing any large memory load from inside the predict function, or is it all in the model load?