[BUG] Unable to run `docker compose` following the instructions: /opt/conda/envs/homl3/bin/jupyter directory missing
vasigorc opened this issue · 2 comments
Describe the bug
I use a GPU powered Linux laptop and I couldn't successfully run the docker compose scenario.
Here are my prerequisites:
# docker is installed
~ docker --version
Docker version 26.1.4, build 5650f9b
# so is the docker compose plugin
~ docker compose version
Docker Compose version v2.27.1
# nvidia container toolkit is intalled
~ dpkg -l | grep nvidia-container-toolkit
ii nvidia-container-toolkit 1.12.1-0pop1~1679409890~22.04~5f4b1f2 amd64 NVIDIA Container toolkit
ii nvidia-container-toolkit-base 1.12.1-0pop1~1679409890~22.04~5f4b1f2 amd64 NVIDIA Container Toolkit Base
# and configured
~ cat /etc/docker/daemon.json
{
"runtimes": {
"nvidia": {
"args": [],
"path": "nvidia-container-runtime"
}
}
}%
# nvidia container toolkit sample workload working
docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
Unable to find image 'ubuntu:latest' locally
latest: Pulling from library/ubuntu
9c704ecd0c69: Pull complete
Digest: sha256:2e863c44b718727c860746568e1d54afd13b2fa71b160f5cd9058fc436217b30
Status: Downloaded newer image for ubuntu:latest
Thu Jun 20 02:37:30 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.67 Driver Version: 550.67 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 4080 ... Off | 00000000:02:00.0 Off | N/A |
| N/A 46C P8 4W / 150W | 122MiB / 12282MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+
# ML compatible GPU is availble
nvidia-smi
Wed Jun 19 21:13:15 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.67 Driver Version: 550.67 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 4080 ... Off | 00000000:02:00.0 Off | N/A |
| N/A 51C P8 6W / 150W | 122MiB / 12282MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 3440 G /usr/lib/xorg/Xorg 18MiB |
| 0 N/A N/A 9091 C+G warp-terminal 91MiB |
+-----------------------------------------------------------------------------------------+
# made the required GPU related changes in `docker-compose.yml`
diff --git a/docker/docker-compose.yml b/docker/docker-compose.yml
index d8893d9..8ca7305 100644
--- a/docker/docker-compose.yml
+++ b/docker/docker-compose.yml
@@ -1,14 +1,16 @@
+# Copied from https://github.com/ageron/handson-ml3/blob/main/docker/docker-compose.yml
+# Modification instructions copied from https://github.com/ageron/handson-ml3/tree/main/docker#prerequisites-1
version: "3"
services:
handson-ml3:
build:
context: ../
- dockerfile: ./docker/Dockerfile #Dockerfile.gpu
+ dockerfile: ./docker/Dockerfile.gpu
args:
- username=devel
- userid=1000
container_name: handson-ml3
- image: ageron/handson-ml3:latest #latest-gpu
+ image: ageron/handson-ml3:latest-gpu
restart: unless-stopped
logging:
driver: json-file
@@ -20,8 +22,8 @@ services:
volumes:
- ../:/home/devel/handson-ml3
command: /opt/conda/envs/homl3/bin/jupyter lab --ip='0.0.0.0' --port=8888 --no-browser
- #deploy:
- # resources:
- # reservations:
- # devices:
- # - capabilities: [gpu]
+ deploy:
+ resources:
+ reservations:
+ devices:
+ - capabilities: [gpu]
\ No newline at end of file
To Reproduce
- Use a
POP!_OS
orUbuntu
22.04 LTS
- Install the prerequisites
- Download ml3 code repository
- Make the required changes
- Run
docker compose up
fromdocker
directory
Here is the output:
docker compose up
WARN[0000] /home/vasilegorcinschi/repos/handson-ml3/docker/docker-compose.yml: `version` is obsolete
Attaching to handson-ml3
Gracefully stopping... (press Ctrl+C again to force)
Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "/opt/conda/envs/homl3/bin/jupyter": stat /opt/conda/envs/homl3/bin/jupyter: no such file or directory: unknown
Expected behavior
The docker container to start
Versions (please complete the following information):
- OS: POP!_OS 22.04 LTS
- Python: 3.10.12
An investigation detail that could be useful: running the image and inspecting the container with bash
I don't see conda
installed, which explains the above error?
~ docker run -it --rm --runtime=nvidia --gpus all ageron/handson-ml3:latest-gpu /bin/bash
________ _______________
___ __/__________________________________ ____/__ /________ __
__ / _ _ \_ __ \_ ___/ __ \_ ___/_ /_ __ /_ __ \_ | /| / /
_ / / __/ / / /(__ )/ /_/ / / _ __/ _ / / /_/ /_ |/ |/ /
/_/ \___//_/ /_//____/ \____//_/ /_/ /_/ \____/____/|__/
You are running this container as user with ID 1000 and group 1000,
which should map to the ID and group for your user on the Docker host. Great!
/sbin/ldconfig.real: Can't create temporary cache file /etc/ld.so.cache~: Permission denied
devel@99e4901df358:~/handson-ml3$ conda env list
bash: conda: command not found
Not sure why conda
is not installed: I pulled the Docker image (didn't build it locally).
FWIW jupyter
is installed inside container at this path:
~ which jupyter
/usr/local/bin/jupyter
Digging down further I faced a similar issue when building the image locally:
conda
wasn't installedjupyter
binary wasn't at the expected location
This PR fixes the issue: #144
Here is a sample output:
docker compose up
WARN[0000] /home/vasilegorcinschi/repos/handson-ml3/docker/docker-compose.yml: `version` is obsolete
[+] Running 1/1
✔ Container handson-ml3 Created 0.1s
Attaching to handson-ml3
handson-ml3 | [I 2024-06-22 00:56:45.071 ServerApp] jupyter_lsp | extension was successfully linked.
handson-ml3 | [I 2024-06-22 00:56:45.074 ServerApp] jupyter_server_mathjax | extension was successfully linked.
handson-ml3 | [I 2024-06-22 00:56:45.076 ServerApp] jupyter_server_terminals | extension was successfully linked.
handson-ml3 | [I 2024-06-22 00:56:45.079 ServerApp] jupyterlab | extension was successfully linked.
handson-ml3 | [I 2024-06-22 00:56:45.079 ServerApp] nbdime | extension was successfully linked.
handson-ml3 | [I 2024-06-22 00:56:45.080 ServerApp] Writing Jupyter server cookie secret to /home/devel/.local/share/jupyter/runtime/jupyter_cookie_secret
handson-ml3 | [I 2024-06-22 00:56:45.592 ServerApp] notebook_shim | extension was successfully linked.
handson-ml3 | [I 2024-06-22 00:56:45.611 ServerApp] notebook_shim | extension was successfully loaded.
handson-ml3 | [I 2024-06-22 00:56:45.613 ServerApp] jupyter_lsp | extension was successfully loaded.
handson-ml3 | [I 2024-06-22 00:56:45.613 ServerApp] jupyter_server_mathjax | extension was successfully loaded.
handson-ml3 | [I 2024-06-22 00:56:45.613 ServerApp] jupyter_server_terminals | extension was successfully loaded.
handson-ml3 | [I 2024-06-22 00:56:45.615 LabApp] JupyterLab extension loaded from /opt/conda/envs/homl3/lib/python3.10/site-packages/jupyterlab
handson-ml3 | [I 2024-06-22 00:56:45.615 LabApp] JupyterLab application directory is /opt/conda/envs/homl3/share/jupyter/lab
handson-ml3 | [I 2024-06-22 00:56:45.615 LabApp] Extension Manager is 'pypi'.
handson-ml3 | [I 2024-06-22 00:56:45.617 ServerApp] jupyterlab | extension was successfully loaded.
handson-ml3 | [I 2024-06-22 00:56:45.709 ServerApp] nbdime | extension was successfully loaded.
handson-ml3 | [I 2024-06-22 00:56:45.709 ServerApp] Serving notebooks from local directory: /home/devel/handson-ml3
handson-ml3 | [I 2024-06-22 00:56:45.709 ServerApp] Jupyter Server 2.14.1 is running at:
handson-ml3 | [I 2024-06-22 00:56:45.709 ServerApp] http://2674095b7bd8:8888/lab?token=1d798602e6f6fc421f80273a15b3b12d10a1d39e050942e0
handson-ml3 | [I 2024-06-22 00:56:45.709 ServerApp] http://127.0.0.1:8888/lab?token=1d798602e6f6fc421f80273a15b3b12d10a1d39e050942e0
handson-ml3 | [I 2024-06-22 00:56:45.709 ServerApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
handson-ml3 | [C 2024-06-22 00:56:45.711 ServerApp]
handson-ml3 |
handson-ml3 | To access the server, open this file in a browser:
handson-ml3 | file:///home/devel/.local/share/jupyter/runtime/jpserver-1-open.html
handson-ml3 | Or copy and paste one of these URLs:
handson-ml3 | http://2674095b7bd8:8888/lab?token=1d798602e6f6fc421f80273a15b3b12d10a1d39e050942e0
handson-ml3 | http://127.0.0.1:8888/lab?token=1d798602e6f6fc421f80273a15b3b12d10a1d39e050942e0
handson-ml3 | [I 2024-06-22 00:56:45.725 ServerApp] Skipped non-installed server(s): bash-language-server, dockerfile-language-server-nodejs, javascript-typescript-langserver, jedi-language-server, julia-language-server, pyright, python-language-server, python-lsp-server, r-languageserver, sql-language-server, texlab, typescript-language-server, unified-language-server, vscode-css-languageserver-bin, vscode-html-languageserver-bin, vscode-json-languageserver-bin, yaml-language-server
handson-ml3 | [W 2024-06-22 00:57:02.777 LabApp] Could not determine jupyterlab build status without nodejs
handson-ml3 | [I 2024-06-22 00:57:29.637 ServerApp] Writing notebook-signing key to /home/devel/.local/share/jupyter/notebook_secret
handson-ml3 | [W 2024-06-22 00:57:29.637 ServerApp] Notebook 01_the_machine_learning_landscape.ipynb is not trusted
handson-ml3 | [I 2024-06-22 00:57:30.064 ServerApp] Kernel started: ab6ef08f-0a04-4020-bc1e-72a766350767
handson-ml3 | [I 2024-06-22 00:57:31.438 ServerApp] Connecting to kernel ab6ef08f-0a04-4020-bc1e-72a766350767.
handson-ml3 | [I 2024-06-22 00:57:31.451 ServerApp] Connecting to kernel ab6ef08f-0a04-4020-bc1e-72a766350767.
handson-ml3 | [I 2024-06-22 00:57:31.464 ServerApp] Connecting to kernel ab6ef08f-0a04-4020-bc1e-72a766350767.
handson-ml3 | [I 2024-06-22 00:57:37.034 ServerApp] Starting buffering for ab6ef08f-0a04-4020-bc1e-72a766350767:4a9249d7-0e12-40e7-87fb-071b24a4de19
@ageron I don't have access to associate this issue to the PR or to assign you as a reviewer, but I'd apprecitate your review (and merge probably too, since only people with write
access can merge).