NVIDIA/workbench-example-hybrid-rag

Build error - command not found

Opened this issue · 5 comments

I have just started looking at this, but if this is meant to be an out of the box demo sort of deal, it looks like the build is broken. I just did a fresh install, let workbench deploy podman for me, and forked this repo. Below are the results.

Hit:5 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  InRelease
Reading package lists...
Reading package lists...
Building dependency tree...
Reading state information...
E: Unable to locate package libgl1
E: Unable to locate package libglib2.0-0
E: Couldn't find any package by glob 'libglib2.0-0
   '
E: Couldn't find any package by regex 'libglib2.0-0
   '
E: Unable to locate package git
Error: building at STEP "RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y     libgl1
     libglib2.0-0
     git
     jq": while running runtime: exit status 100

System Info

Not that it really much matters since the build appears to be failing looking for some packages that don't exist in a container, but the host system is:

DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu 22.04.4 LTS"
PRETTY_NAME="Ubuntu 22.04.4 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.4 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy


Linux linux-desktop 6.5.0-28-generic #29~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Apr  4 14:39:20 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

can we get a little more context on this?

you cloned the project in workbench and the build failed?

or did you try and build things outside of workbench?

can we get a little more context on this?

I did as this part of the tutorial instructed.

you cloned the project in workbench and the build failed?

I cloned it using the clone project button here after I created my own fork:

image

The build starts but this is as far as it gets:

STEP 1/32: FROM ghcr.io/huggingface/text-generation-inference:latest
STEP 2/32: WORKDIR /opt/project/build/
 Using cache 9c39d76f60b717ac3b593657d316de16beb55bfeb055e39a6106172c4ac8732a
 9c39d76f60b7
STEP 3/32: SHELL ["/bin/bash", "-c"]
 Using cache 7371ead4fd9fcd7d8f3df811832781a4be5820df4c1fcbe4231d8555d19aa430
 7371ead4fd9f
STEP 4/32: USER root
 Using cache 9e9979887cef453e1ab1466268d897eba7b1eb0f0f10a582d21c5cecf92a699e
 9e9979887cef
STEP 5/32: RUN groupadd -g 1000 workbench || true
 Using cache 751e8fd384b5c2630f74600076a3389314d7a13564617966a7eae26e5bd332e1
 751e8fd384b5
STEP 6/32: RUN useradd -u 1000 -g 1000 -rm -d /home/workbench -s /bin/bash workbench || usermod -l workbench $(getent passwd 1000 | cut -d: -f1)
 Using cache cd051d64f4101e27e5d3ced12073041d0cd4f33f3bdf2b5de42876956ab5a990
 cd051d64f410
STEP 7/32: RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y     sudo
 Using cache 9ceb08d3bd830a76029b1fdf6b608ba0b3b1f38d35af089102814d7a7fd1705d
 9ceb08d3bd83
STEP 8/32: RUN echo "workbench	ALL=(ALL) NOPASSWD:ALL" > /etc/sudoers.d/workbench
 Using cache de3ca639ee9bb62ed6793937ab0bfe2df9a98865a9f12c84eccdeb2bf1d74e28
 de3ca639ee9b
STEP 9/32: ENV NVWB_UID=1000
 Using cache 3412168d918a56f91825ef9259abaae936b456b45f5bc934761e5b62329641d4
 3412168d918a
STEP 10/32: ENV NVWB_GID=1000
 Using cache 01accd2e20b0978d676ddd46b71747da82175720d5d99437d35a4e03f2388953
 01accd2e20b0
STEP 11/32: ENV NVWB_USERNAME=workbench
 Using cache 7017347f865e1ea76b5c253202115200f71581d7e80c9fca286f831285e54972
 7017347f865e
STEP 12/32: USER $NVWB_USERNAME
 Using cache 9db5b79d8320b930ba3ad9220ccdfc9add4babff2c46093964715eed40b86384
 9db5b79d8320
STEP 13/32: COPY --chown=$NVWB_UID:$NVWB_GID  ["preBuild.bash", "/opt/project/build/"]
 Using cache 24863b6d2189f9f2b7b76ad08873928e27ab534c89e4ca06b18c004153f245aa
 24863b6d2189
STEP 14/32: RUN ["/bin/bash", "/opt/project/build/preBuild.bash"]
 Using cache 0c0579d653a4a8b9bab1d3e6dd421e9950858e0fc670e90c8a6c962810baa3ef
 0c0579d653a4
STEP 15/32: USER root
 Using cache 07d489c5c3c1966bcc7b548cfe6e6cbfe20e6a1f3b6cb595cf3c39ba79894a2e
 07d489c5c3c1
STEP 16/32: RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y     wget
 Using cache 6b6f7afaa6e9b8c937a559deba09288ec59b247d04cbc2bf4f2e4105372e3679
 6b6f7afaa6e9
STEP 17/32: RUN dpkgArch="$(dpkg --print-architecture | awk -F- '{ print $NF }')"; wget -O- https://github.com/tianon/gosu/releases/download/1.17/gosu-${dpkgArch} | install /dev/stdin /usr/local/bin/gosu
 Using cache 4a73c61d25c23be76cc67753c00fe3a5845ca7bea0f8dfeb367bc91b3e64b861
 4a73c61d25c2
STEP 18/32: COPY  --chmod=755 ["entrypoint.sh", "/"]
 Using cache 21f0b357011345147b3c52c4f437372989b2d95db8db3de1f22aa82e36ea08c6
 21f0b3570113
STEP 19/32: ENV NVWB_BASE_ENV_ENTRYPOINT=
 Using cache a69cb83086ac33ec15b3e309f28ccd4dc1851eeff2b53e4b5b8fa90277197641
 a69cb83086ac
STEP 20/32: USER $NVWB_USERNAME
 Using cache bf604094e85375800ea71b99da88f338ac9d4f832daaa212e33153594dee490b
 bf604094e853
STEP 21/32: USER root
 Using cache 05f66823ac9214b3956ea40810a7e6b8459c1c6c0d5f796cf5c3ae6e83506b58
 05f66823ac92
STEP 22/32: RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y     libgl1
     libglib2.0-0
     git
     jq
Hit:1 http://archive.ubuntu.com/ubuntu jammy InRelease
Hit:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  InRelease
Get:3 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [119 kB]
Get:4 http://security.ubuntu.com/ubuntu jammy-security InRelease [110 kB]
Hit:5 http://archive.ubuntu.com/ubuntu jammy-backports InRelease
Get:6 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 Packages [2037 kB]
Fetched 2266 kB in 1s (1823 kB/s)
Reading package lists...
Reading package lists...
Building dependency tree...
Reading state information...
E: Unable to locate package libgl1
E: Unable to locate package libglib2.0-0
E: Couldn't find any package by glob 'libglib2.0-0
   '
E: Couldn't find any package by regex 'libglib2.0-0
   '
E: Unable to locate package git
Error: building at STEP "RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y     libgl1
     libglib2.0-0
     git
     jq": while running runtime: exit status 100

Build Failed

I went as far as firing up that container manually with:

apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y     libgl1
     libglib2.0-0
     git
     jq

that also fails. You get:

...
Setting up libgl1-mesa-dri:amd64 (23.2.1-1ubuntu3.1~22.04.2) ...
Setting up libglx-mesa0:amd64 (23.2.1-1ubuntu3.1~22.04.2) ...
Setting up libglx0:amd64 (1.4.0-1) ...
Setting up libgl1:amd64 (1.4.0-1) ...
Processing triggers for libc-bin (2.35-0ubuntu3.4) ...
bash: libglib2.0-0: command not found
bash: git: command not found
bash: jq: command not found

It looks ilke these are getting pulled out of the file apt.txt. I just tried ignoring the file and changing its format from this:

# apt packages to install should be listed one per line
libgl1
libglib2.0-0

#
git
jq

to this:

# apt packages to install should be listed one per line
libgl1 libglib2.0-0 git jq

and the build has progressed past step 22. I'm guessing the odd spacing is the source of the bug.

or did you try and build things outside of workbench?

No - I did everything inside the Nvidia AI Workbench UI. No command line.

It now seems to have an unrelated failure though. Full build log attached. I choose podman when prompted to choose during install because that's what most of my customers are running with K8s. A naive glance at the error makes me think the installer assumes docker will be present

WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
/opt/project/build/postBuild.bash: line 23: $'\r': command not found
groupadd: 'docker-group
' is not a valid group name
usermod: group 'docker-group' does not exist
/opt/project/build/postBuild.bash: line 26: $'\r': command not found
chown: cannot access '/data': No such file or directory
Error: building at STEP "RUN /bin/bash /opt/project/build/postBuild.bash": while running runtime: exit status 1

build_failed.txt

It now seems to have an unrelated failure though. Full build log attached. I choose podman when prompted to choose during install because that's what most of my customers are running with K8s. A naive glance at the error makes me think the installer assumes docker will be present

WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
/opt/project/build/postBuild.bash: line 23: $'\r': command not found
groupadd: 'docker-group
' is not a valid group name
usermod: group 'docker-group' does not exist
/opt/project/build/postBuild.bash: line 26: $'\r': command not found
chown: cannot access '/data': No such file or directory
Error: building at STEP "RUN /bin/bash /opt/project/build/postBuild.bash": while running runtime: exit status 1

build_failed.txt

If you're good with the PR I'll move this to a separate issue. I can take a look at it on Monday.

I didn't spend time looking to see if there are other dependencies assuming everything each be on its own line.

When I told Workbench to clone the project for me it pulled everything with CRLF which caused the build to choke.

I removed all the carriage returns recursively with:

#!/bin/bash

# Function to remove carriage return characters from a file
remove_carriage_returns() {
    sed -i 's/\r$//' "$1"
    echo "Removed carriage return characters from $1"
}

# Navigate to the folder containing the files
cd /path/to/your/folder

# Find all files recursively in the folder and its subdirectories
find . -type f | while IFS= read -r file; do
    # Remove carriage return characters from each file
    remove_carriage_returns "$file"
done

Not sure if this is necessarily a problem with this project, but if you leave any of them either the build fails or the chat app itself fails.