NVIDIA/retinanet-examples

Running docker image with GPU problem

RoudyES opened this issue · 1 comments

Hello, I am trying to run the odtk image that is built using: docker build -t odtk:latest retinanet-examples/. This image is built using docker pull nvcr.io/nvidia/pytorch:21.08-py3's image as a base image (I've changed the dockerfile included in this repo to use v21.08 instead of 21.07).

The image was successfully built but whenever I try to run it with docker run --gpus all --rm --ipc=host -it -v/your/data/dir:/data odtk:latest I get the following error message:

docker: Error response from daemon: OCI runtime create failed: container_linux.go:367: starting container process caused: process_linux.go:495: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request: unknown.

If I remove the "--gpus all" flag the container runs fine but then I get the following error when trying to train (which I think is normal because the container is built with CUDA in mind):
RuntimeError: Found param backbones.ResNet50FPN.features.conv1.weight with type torch.FloatTensor, expected torch.cuda.FloatTensor.

I've only installed docker for windows. Then, pulled the base image using the above command then built odtk image on top of it as specified in the readme.
Are there any additional steps required in order to successfully run it?

If it helps, this is the wsl version:
image

Solved!
Updated to windows 11 through windows insider program and updated WSL, issue was automatically solved.