Debian 11 repo is broken
MKrupauskas opened this issue ยท 18 comments
The commit 2dff280 restructured the Debian 11 repo in a breaking way.
Previously the below setup used to work, now it's failing:
root@host:/# cat /etc/apt/sources.list | grep nvidia
deb [arch=amd64] http://aptarchive.uber.internal/libnvidia-container/debian11/amd64 /
root@host:/# apt update
...
E: The repository 'http://repo.internal/libnvidia-container/debian11/amd64 Release' does not have a Release file.
this is because /debian11
used to symlink to /stable/debian11
which used to symlink /stable/debian10
which contained the amd64
directory with the .deb builds. https://github.com/NVIDIA/libnvidia-container/tree/9ce31ae4f042508cd8aabfad6168114c1cde30f0
/debian10
, /debian11
, /stable/debian11
should all have amd64
symlinks ultimately pointing to /stable/debian10/amd64
@MKrupauskas would switching to /libnvidia-container/debian10/amd64
as the source of truth for the package be a solution on your end?
Our intent with the official documentation was to make the downloading the repository list file work across different distributions, but the .list
files would locally refer to the lowest compatible distribution for a given package flavor.
In the Debian case, this is debian10
. The motivation for the changes that are causing the breakages are called out in NVIDIA/nvidia-container-toolkit#89 (comment)
All these user complaints would be solved with a symlink ๐
All these user complaints would be solved with a symlink ๐
@jonathanjsimon it's not quite a simple as that. A symlink duplicates the contents of the target folder at the link location when publishing these repos through GitHub pages. The reason this optimisation was performed was that the resultant artifact is already too large, causing the pages deployment to fail meaning that new packages are not available.
We are aware that there may be ways to increase the timeout using custom pages deployments. If you have experience in how to do this, suggestions are welcome.
While we did work around the issue by pointing our source list to Debian 10 the solution isn't ideal. If the only issue is the artifact size and build timeouts I think we should address that for the sake of having a Debian repo that matches the repo standard and user expectations.
Could you share some logs on what exactly times out if we correctly symlink the distribution directories? Looking at github action docs the steps themselves shouldn't time out for 360m if the default isn't overridden https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#jobsjob_idstepstimeout-minutes
@MKrupauskas I have made the symlink changes to my personal mirror elezar@98ee43d.
The GitHub actions deploying this is here:
A previous action shows the archive size warning:
The following is an example of a deployment that failed due to a timeout, although this was using the "Deploy from branch" pages deployment and not an explicit workflow as we are using now.
We have updated our repository structure and installation instructions to make use of generic debian packages. The distribution name no longer affects the instructions.
Please see https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html and reopen this issue if there are still problems.
Hi there,
when using tools like apt-mirror or apt-mirror2 the file Packages
always is empty after being downloaded from https://nvidia.github.io/libnvidia-container/stable/deb/amd64/Packages, but works in a browser. Do you have any idea where to search for a solution?
@HenriWahl I don't know what apt-mirror
expects. This is the file tree as deployed to GitHub pages: https://github.com/NVIDIA/libnvidia-container/tree/gh-pages/stable/deb/amd64
If there is additional metadata required by the toolking we could consider adding it.
@elezar I am not sure what is missing, looks good to me.
The only hint I have that it works with https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64, maybe there is some difference.
Edit: yes, there are some differences:
- https://nvidia.github.io/libnvidia-container/stable/deb/amd64/Packages contains no
Packages.gz
butPackages.xz
- https://nvidia.github.io/libnvidia-container/stable/deb/amd64/Packages has no
Release
andRelease.gpg
files
Edit 2: I found this being an older problem: NVIDIA/nvidia-docker#730
Those are useful pointers. I will spend some time investigating this.
I have just tried the following in a clean ubuntu
container:
- Installed
apt-mirror
- Edit
/etc/apt/mirror.list
to only reference:
deb https://nvidia.github.io/libnvidia-container/experimental/deb/amd64 /
- When running
apt-mirror
I then see:
Processing indexes: [Psh: 1: xz: not found
]
- I then installed
xz-utils
:
apt-get install -y xz-utils
- When I now ran
apt-mirror
the repo is mirrored:
$ ls /var/spool/apt-mirror/mirror/
nvidia.github.io
- And in the folders themselves:
ls /var/spool/apt-mirror/mirror/nvidia.github.io/libnvidia-container/experimental/deb/amd64/
Packages libnvidia-container-tools_1.15.0~rc.3-1_amd64.deb nvidia-container-toolkit-base_1.14.0~rc.2-1_amd64.deb
Packages.xz libnvidia-container1-dbg_1.14.0~rc.2-1_amd64.deb nvidia-container-toolkit-base_1.15.0~rc.1-1_amd64.deb
libnvidia-container-dev_1.14.0~rc.2-1_amd64.deb libnvidia-container1-dbg_1.15.0~rc.1-1_amd64.deb nvidia-container-toolkit-base_1.15.0~rc.2-1_amd64.deb
libnvidia-container-dev_1.15.0~rc.1-1_amd64.deb libnvidia-container1-dbg_1.15.0~rc.2-1_amd64.deb nvidia-container-toolkit-base_1.15.0~rc.3-1_amd64.deb
libnvidia-container-dev_1.15.0~rc.2-1_amd64.deb libnvidia-container1-dbg_1.15.0~rc.3-1_amd64.deb nvidia-container-toolkit_1.14.0~rc.2-1_amd64.deb
libnvidia-container-dev_1.15.0~rc.3-1_amd64.deb libnvidia-container1_1.14.0~rc.2-1_amd64.deb nvidia-container-toolkit_1.15.0~rc.1-1_amd64.deb
libnvidia-container-tools_1.14.0~rc.2-1_amd64.deb libnvidia-container1_1.15.0~rc.1-1_amd64.deb nvidia-container-toolkit_1.15.0~rc.2-1_amd64.deb
libnvidia-container-tools_1.15.0~rc.1-1_amd64.deb libnvidia-container1_1.15.0~rc.2-1_amd64.deb nvidia-container-toolkit_1.15.0~rc.3-1_amd64.deb
libnvidia-container-tools_1.15.0~rc.2-1_amd64.deb libnvidia-container1_1.15.0~rc.3-1_amd64.deb
Could you confirm that xz-utils
is installed on your system?
Hi @elezar - thanks for your investigations!
I can confirm that my apt-mirror
image did NOT have the package xz-utils
installed but now it works WITH it!
Great job! ๐
@elezar one thing is left: now the apt
command on a client cries that there is no Release
file.
I see it is even missing at https://github.com/NVIDIA/libnvidia-container/tree/gh-pages/stable/deb/amd64.
From the following documentation: https://wiki.debian.org/DebianRepository/Format#Flat_Repository_Format it is unclear whether a Release
file is actually required. It seems that either InRelease
or Release
must be specified.
Can you give more information on what apt
commands you're using and what the errors are?
After an apt update
i get this:
Ign:5 https://mirror-apt.local/nvidia-container-toolkit-jammy InRelease
Ign:6 https://mirror-apt.local/nvidia-cuda-jammy InRelease
Err:7 https://mirror-apt.local/nvidia-container-toolkit-jammy Release
404 Not Found [IP: 10.10.10.10 443]
Hit:8 https://mirror-apt.local/nvidia-cuda-jammy Release
Reading package lists... Done
E: The repository 'https://mirror-apt.local/nvidia-container-toolkit-jammy Release' does not have a Release file.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
InRealease
and Release
are both getting tried. Meanwhile I found that none of them does exist in my local mirror, as in your listing above.
Does:
sudo apt-get update --allow-insecure-repositories
work as expected?
Yes it does.
The problem seems to be caused by apt-mirror, according to apt-mirror/apt-mirror#156. It seems to miss this file on flat repositories. I will look for it or an alternative next week. Thanks for your commitment!
Yes it does.
The problem seems to be caused by apt-mirror, according to apt-mirror/apt-mirror#156. It seems to miss this file on flat repositories. I will look for it or an alternative next week. Thanks for your commitment!
I think you can get by this by marking the local mirror as trusted or ensuring that the public key for our repos is also downloaded. For example, as per our documentation https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#installing-with-apt:
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
Note that the lines effectively look like:
deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://nvidia.github.io/libnvidia-container/stable/deb/$(ARCH) /
in this case and setting up something similar for your mirrors would be needed.