johanmodin/clifs

OCI runtime create failed ( OS Ubuntu 20.04)

pwnedzen opened this issue · 8 comments

when running the docker-compose build && docker-compose up I get an OCI runtime creage failed ( posted below):

Sending build context to Docker daemon  1.724MB
Step 1/7 : FROM python:3
 ---> 6beb0d435def
Step 2/7 : ENV PYTHONUNBUFFERED=1
 ---> Using cache
 ---> 2aab34bae599
Step 3/7 : WORKDIR /clifs
 ---> Using cache
 ---> 7c6854350226
Step 4/7 : COPY requirements.txt /clifs
 ---> Using cache
 ---> df31f0e3f4ff
Step 5/7 : RUN pip install -r requirements.txt
 ---> Running in 8eca7fd2853d
OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:722: waiting for init preliminary setup caused: EOF: unknown
ERROR: Service 'web' failed to build : Build failed ```


**any ideas I have tried running the command as sudo and user. Upon my digging I found the error might contain to setup of Dockerfile but not sure.**


Not seen this issue before. I would check that docker has enough drive space available and that the docker version is relatively new. I guess you could also try DOCKER_BUILDKIT=1 just ahead of your build command.

Ran with that appended to the command with some other minor changes and new error message is this:

Attaching to clifs-web-1, search-engine
clifs-web-1    | standard_init_linux.go:228: exec user process caused: no such file or directory
Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request: unknown```

**any other advice would be appreciated thanks, still debugging on my end as well.**

Alright, so it seems to build. If you're trying to run the GPU-enabled docker-compose file, make sure you've got the nvidia-container-toolkit installed as well as a NVIDIA driver that works for CUDA 10.1.

You could check just your NVIDIA setup by running e.g., the image used in this app (pytorch/pytorch:1.6.0-cuda10.1-cudnn7-devel) and test that the GPU is available to it.

Still looks like I am having trouble finding the correct NVIDIA driver. Do you have link or more information on which on you used, and where you downloaded it from. This is the last error I am encountering:

Successfully tagged clifs_search-engine:latest
Starting clifs_web_1   ... done
Starting search-engine ... done
Attaching to search-engine, clifs_web_1
web_1            | standard_init_linux.go:228: exec user process caused: no such file or directory
clifs_web_1 exited with code 1
search-engine    |  * Serving Flask app 'server' (lazy loading)
search-engine    |  * Environment: production
search-engine    |    WARNING: This is a development server. Do not use it in a production deployment.
search-engine    |    Use a production WSGI server instead.
search-engine    |  * Debug mode: off
search-engine    | /opt/conda/lib/python3.7/site-packages/torch/cuda/__init__.py:52: UserWarning: CUDA initialization: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx (Triggered internally at  /pytorch/c10/cuda/CUDAFunctions.cpp:100.)
search-engine    |   return torch._C._cuda_getDeviceCount() > 0
search-engine    | 2021-10-04 13:51:48,526 -  * Running on all addresses.
search-engine    |    WARNING: This is a development server. Do not use it in a production deployment.
search-engine    | 2021-10-04 13:51:48,526 -  * Running on http://172.19.0.3:5000/ (Press CTRL+C to quit)
^CGracefully stopping... (press Ctrl+C again to force)




**I already have NVIDIA driver on my machine but torch still not finding it. Thanks**

Make sure you are using the GPU docker-compose file (as stated in instructions) and make sure you have the correct nvidia drivers and cuda container toolkit. For the latter two items, I think googling is the easiest solution as it can be a few steps to setup.

I have checked and resolved these issues for now, am now running into the same issue posted by

hoel-bagard: OpenCV error when opening the video #6

search-engine    | 
search-engine    | Traceback (most recent call last):
search-engine    |   File "server.py", line 8, in <module>
search-engine    |     clifs = CLIFS()
search-engine    |   File "/app/clifs.py", line 30, in __init__
search-engine    |     self.add_video(f)
search-engine    |   File "/app/clifs.py", line 71, in add_video
search-engine    |     feature_t = torch.cat(feature_list, dim=0)
search-engine    | RuntimeError: There were no tensor arguments to this function (e.g., you passed an empty list of Tensors), but no fallback function is registered for schema aten::_cat.  This usually means that this function requires a non-empty list of Tensors.  Available functions are [CPU, CUDA, QuantizedCPU, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].

Great. That's not an error I've seen earlier either, but I will look into it. Closing this issue as it is now covered by #6

Just wanted to add, started fresh with a Ubuntu 18.04 install on a VM (not WSL). And got the project working! Really amazing project johanmodin Thanks for your work and responses!