awslabs/multi-model-server

Permission denied when loading model

akulk314 opened this issue · 1 comments

Pip packages:
sagemaker-inference=1.5.2
multi-model-server=1.1.2
mxnet-model-server=1.0.8 (Note sagemaker-inference still refers to this)

When inference requests are issued to MMS, it fails with a Permission Denied error. Stack-trace is included below. This issue is only seen when MMS is run as a non-root user. Due to security constraints, it may not be possible to always run MMS as the root user, and so a fix for this issue would be useful.

2020-10-13 15:07:00,589 [INFO ] W-9001-934e18ea53c68813cffe9b354-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Backend worker process died.
2020-10-13 15:07:00,590 [INFO ] W-9001-934e18ea53c68813cffe9b354-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Traceback (most recent call last):
2020-10-13 15:07:00,590 [INFO ] W-9001-934e18ea53c68813cffe9b354-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File "/home/someuser/.local/lib/python3.6/site-packages/mms/model_service_worker.py", line 174, in start_worker
2020-10-13 15:07:00,590 [INFO ] W-9001-934e18ea53c68813cffe9b354-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     self.handle_connection(cl_socket)
2020-10-13 15:07:00,590 [INFO ] W-9001-934e18ea53c68813cffe9b354-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File "/home/someuser/.local/lib/python3.6/site-packages/mms/model_service_worker.py", line 143, in handle_connection
2020-10-13 15:07:00,590 [INFO ] W-9001-934e18ea53c68813cffe9b354-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     result, code = self.load_model(msg)
2020-10-13 15:07:00,590 [INFO ] W-9001-934e18ea53c68813cffe9b354-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File "/home/someuser/.local/lib/python3.6/site-packages/mms/model_service_worker.py", line 106, in load_model
2020-10-13 15:07:00,590 [INFO ] W-9001-934e18ea53c68813cffe9b354-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     self._create_io_files(self.tmp_dir, io_fd)
2020-10-13 15:07:00,590 [INFO ] W-9001-934e18ea53c68813cffe9b354-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -   File "/home/someuser/.local/lib/python3.6/site-packages/mms/model_service_worker.py", line 120, in _create_io_files
2020-10-13 15:07:00,590 [INFO ] W-9001-934e18ea53c68813cffe9b354-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     os.mkfifo(self.out)
2020-10-13 15:07:00,594 [INFO ] W-9001-934e18ea53c68813cffe9b354-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - PermissionError: [Errno 13] Permission denied

The user has access to the tmp directory. Verified by manually running in a shell in the container.

Version.py at the path /home/someuser/.local/lib/python3.6/site-packages/mms/ indicates this is the 1.1.2 version of the library being used and not the 1.0.8 version of the mxnet-model-server library.

I have the same issue