louisoutin/yolov5_torchserve

--model-store directory not found: model_store

metempasa opened this issue · 8 comments

Hello, Thanks for this project,

I use your default dockerfile, it is built but when i run the docker i got this error --model-store directory not found: model_store

You have any idea?

Hello and happy that you like it.

Hard to say without more information. But make sure you have your trained weights on the ressources/ folder.
I would say it looks that the command to convert the weights to torchscript failed. Show me your logs if you want more help.

Cheers

that was all my bad sorry

CMD [ "torchserve", "--start", "--model-store", "model_store", "--models", "my_model_name=my_model_name.mar" ]

it should be "--model-store", "model-store" instead of "--model-store", "model_store", then problem will be solved.

Thanks for your response

one last thing,

could you please share your post body for example, i am newbie at this multipart post thing.

Have a nice day!

There is no json body in the request, its a multipart form request:
Key/value with the keys strings like "img_{i}" and value is bytecode array of the image

Screenshot from 2021-01-28 09-37-30

I think there is an error right ?

I even dont know how to sanity check or test the serve :D

I changed my own java_home which is working (I am sure) path but it is still same

@metempasa I am facing the same error when running torchserve i.e.

2021-02-15 17:04:15,545 [DEBUG] W-9000-my_model_name_0.1 org.pytorch.serve.wlm.WorkerThread - Backend worker monitoring thread interrupted or backend worker process died.
java.lang.InterruptedException
        at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2056)    
        at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2133)
        at java.base/java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:432)
        at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:188)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834)
2021-02-15 17:04:15,546 [INFO ] W-9000-my_model_name_0.1-stdout org.pytorch.serve.wlm.WorkerLifeCycle -     self.handle_connection(cl_socket)
2021-02-15 17:04:15,573 [INFO ] W-9000-my_model_name_0.1-stdout org.pytorch.serve.wlm.WorkerLifeCycle -   File "C:\ProgramData\Miniconda3\Lib\site-packages\ts\model_service_worker.py", line 116, in handle_connection
2021-02-15 17:04:15,573 [WARN ] W-9000-my_model_name_0.1 org.pytorch.serve.wlm.BatchAggregator - Load model failed: my_model_name, error: Worker died.  

OS: Windows 10
Python: 3.6

If you are running with a GPU try to:

  • check that you have nvidia-docker installed
  • make a change in docker-compose configs to force GPU usage (there is an issue on docker-compose github open)