NVIDIA/retinanet-examples

Windows 10 Pro Docker Desktop Bus Error

Testbild opened this issue · 1 comments

Hello everyone,

I just did the following steps:

  • Installed Docker Desktop on my Win10 Pro (https://docs.docker.com/desktop/windows/install/) v20.10.8

  • I tested the Docker Desktop with the Tutorial and the Docker Hello World in the Windows CMD Terminal and they worked

  • I turned off the "buildkit" feature
    {
    "registry-mirrors": [],
    "insecure-registries": [],
    "debug": false,
    "experimental": false,
    "features": {
    "buildkit": false
    },
    "builder": {
    "gc": {
    "enabled": true,
    "defaultKeepStorage": "20GB"
    }
    }
    }

  • In the CMD started as Admin I did the: git clone https://github.com/nvidia/retinanet-examples

  • After cloning I did the: docker build -t odtk:latest retinanet-examples/

  • Now I see the Container in my Docker Desktop
    image

  • When I use the docker run --gpus all --rm --ipc=host -it odtk:latest from the CMD I get the following issue:

docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request: unknown.

  • Hence I started the container from the Docker Desktop client
  • When using the
odtk train retinanet_rn50fpn.pth --backbone ResNet50FPN \
     --images /coco/images/train2017/ --annotations /coco/annotations/instances_train2017.json \
     --val-images /coco/images/val2017/ --val-annotations /coco/annotations/instances_val2017.json

This happens:

NOTE! Installing ujson may make loading annotations faster.
Initializing model...
Downloading: "https://download.pytorch.org/models/resnet50-0676ba61.pth" to /root/.cache/torch/hub/checkpoints/resnet50-0676ba61.pth
100%|██████████████████████████████████████████████████████████████████████████████| 97.8M/97.8M [00:14<00:00, 7.24MB/s]
     model: RetinaNet
  backbone: ResNet50FPN
   classes: 80, anchors: 9
Bus error

My expectation was that

  1. I would be able to start the container with the command provided in the install guide
  2. That if I run the example from the repo in the container started with Docker Desktop there will be no bus error

I am not sure what I am doing wrong and would love some help here. If more information is needed just let me know. I hope I included everything.

Thank you for your help!

Following this: https://docs.nvidia.com/cuda/wsl-user-guide/index.html from start to bottom resolved the issue for me. I somehow did not notice in the first place, that Windows 11 is needed.