pguyot/arm-runner-action

Check size of downloadable images to warn users they may run out of disk space

razr opened this issue · 7 comments

razr commented

I try to run the Nvidia Orin image.

It turned out there is not enough space inside the docker container to unzip it.

-rw-rw-r--  1 user user 11156299681 Mar 20 04:51 jp511-orin-nano-sd-card-image.zip
-rw-r--r--  1 user user 22144876544 May  4 15:18 sd-blob.img

I have removed Android and dotnet as

    - name: Increase free space
      # Remove Android and dotnet
      run: |
        sudo rm -rf /usr/local/lib/android
        sudo rm -rf /usr/share/dotnet
        df -h

After that unzip works, but it fails to mount it.

Created loopback device /dev/loop3
/dev/loop3: gpt partitions 2 3 4 5 6 7 8 9 10 11 12 13 14 1
mount: /home/actions/temp/arm-runner/mnt: wrong fs type, bad option, bad superblock on /dev/loop3p2, missing codepage or helper program, or other error.
~/tmp$ fdisk -l sd-blob.img 
Disk sd-blob.img: 20.62 GiB, 22144876544 bytes, 43251712 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: E078F7E2-0752-4DF8-8B74-6A5CD5C5AE8B

Device          Start      End  Sectors   Size Type
sd-blob.img1  3057664 43233279 40175616  19.2G Linux filesystem
sd-blob.img2     2048   264191   262144   128M Linux filesystem
sd-blob.img3   264192   265727     1536   768K Linux filesystem
sd-blob.img4   266240   331007    64768  31.6M Linux filesystem
sd-blob.img5   331776   593919   262144   128M Linux filesystem
sd-blob.img6   593920   595455     1536   768K Linux filesystem
sd-blob.img7   595968   660735    64768  31.6M Linux filesystem
sd-blob.img8   661504   825343   163840    80M Linux filesystem
sd-blob.img9   825344   826367     1024   512K Linux filesystem
sd-blob.img10  827392   958463   131072    64M EFI System
sd-blob.img11  958464  1122303   163840    80M Linux filesystem
sd-blob.img12 1122304  1123327     1024   512K Linux filesystem
sd-blob.img13 1124352  1255423   131072    64M Linux filesystem
sd-blob.img14 1255424  3056639  1801216 879.5M Linux filesystem

It works, if I do mount manually as

sudo mount -v -o offset=$((512 * 3057664)) -t ext4 sd-blob.img ~/tmp/arm-runner/
/tmp$ ls arm-runner
bin  boot  dev  etc  home  lib  lost+found  media  mnt  opt  proc  README.txt  root  run  sbin  snap  srv  sys  tmp  usr  var
razr commented

Sorry, I missed that I should add rootpartition: 1. With that change it works:

+ cat /etc/os-release
NAME="Ubuntu"
VERSION="20.04.5 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.5 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal
+ uname -a
Linux fv-az216-153 5.15.0-1036-azure #43-Ubuntu SMP Wed Mar 29 16:11:05 UTC 2023 aarch64 aarch64 aarch64 GNU/Linux
razr commented

There is an error at the end:

Zero-filling unused blocks on boot filesystem...
Zero-filling unused blocks on root filesystem...
Resizing root filesystem to minimal size.
e2fsck: No such file or directory while trying to open /dev/loop3p1
Possibly non-existent device?
/dev/loop3p11
resize2fs 1.46.5 (30-Dec-2021)
open: No such file or directory while opening /dev/loop3p1
/dev/loop3p11
Usage: tune2fs [-c max_mounts_count] [-e errors_behavior] [-f] [-g group]
	[-i interval[d|m|w]] [-j] [-J journal_options] [-l]
	[-m reserved_blocks_percent] [-o [^]mount_options[,...]]
	[-r reserved_blocks_count] [-u user] [-C mount_count]
	[-L volume_label] [-M last_mounted_dir]
	[-O [^]feature[,...]] [-Q quota_options]
	[-E extended-option[,...]] [-T last_check_time] [-U UUID]
	[-I new_inode_size] [-z undo_file] device
Usage: tune2fs [-c max_mounts_count] [-e errors_behavior] [-f] [-g group]
	[-i interval[d|m|w]] [-j] [-J journal_options] [-l]
	[-m reserved_blocks_percent] [-o [^]mount_options[,...]]
	[-r reserved_blocks_count] [-u user] [-C mount_count]
	[-L volume_label] [-M last_mounted_dir]
	[-O [^]feature[,...]] [-Q quota_options]
	[-E extended-option[,...]] [-T last_check_time] [-U UUID]
	[-I new_inode_size] [-z undo_file] device
Resizing rootfs partition.
/home/runner/work/arm-runner-action/arm-runner-action/.//cleanup_image.sh: line 48: * : syntax error: operand expected (error token is "* ")

Do you need to optimize the image afterwards to use it as an artifact? If not, you can try optimize_image: false.
Likewise, I wonder if boot_partition option can help.

razr commented

I'm not sure yet, does it make an output artifact significantly smaller?
Do you need a PR for NVidia? something like:

  build_nvidia_orin:
    runs-on: ubuntu-22.04
    steps:
    - uses: actions/checkout@v3
    - name: Increase free space
      # Remove Android and dotnet
      run: |
        sudo rm -rf /usr/local/lib/android
        sudo rm -rf /usr/share/dotnet
        df -h
    - uses: ./ # pguyot/arm-runner-action@HEAD
      with:
        base_image: https://developer.nvidia.com/downloads/embedded/l4t/r35_release_v3.1/sd_card_b49/jp511-orin-nano-sd-card-image.zip
        rootpartition: 1
        commands: |
          cat /etc/os-release
          uname -a

Otherwise, I'm good with what I have now. Thank you for your support.

Thank you. I'm not sure it's a good idea to have this case in the CI of the action as it takes about 7 minutes as the image is so large, and we already have a test with an nvidia image.

Still, I used your example to debug the error message you are having and I am confirming the root cause: you did set the root partition, but you didn't set the boot partition which is 1 by default. So both partitions have the same index and so far the action doesn't complain. I believe you want to have no boot partition, which is possible as shown in this existing test:

https://github.com/pguyot/arm-runner-action/blob/main/.github/workflows/test-partitions.yml

Regarding optimizing the image, it depends if your pipeline tries to get the image as an artifact or not. If it doesn't, you may want to disable optimization of the image as it's just wasted CPU cycles. I mentioned this because the image is very large, so I wondered if you really made it an artifact. See the following test:

https://github.com/pguyot/arm-runner-action/blob/main/.github/workflows/test-optimize_image.yml

razr commented

I'm fine with it, I just couldn't find your Nvidia example. It is not in the list of supported images.
I have another comment on the filesystem size issue. At the moment there is no check whether the downloadable image is too big to be unzipped and the build process just silently dies without any log message.
One way to resolve it could be to check the size of the downloadable file before wget it and compare it with available space on the filesystem. E.g. in my case, a zipped image size is 10G + unzipped one is 20G = 30G, and available space is 24G.

I have tried wget with a --spider option, it works for the Ubuntu image, but does not work for NVidia.

wget --spider https://cdimage.ubuntu.com/releases/22.04.2/release/ubuntu-22.04.2-preinstalled-server-arm64+raspi.img.xz
Spider mode enabled. Check if remote file exists.
--2023-05-05 12:58:28--  https://cdimage.ubuntu.com/releases/22.04.2/release/ubuntu-22.04.2-preinstalled-server-arm64+raspi.img.xz
Resolving cdimage.ubuntu.com (cdimage.ubuntu.com)... 185.125.190.40, 91.189.91.124, 185.125.190.37, ...
Connecting to cdimage.ubuntu.com (cdimage.ubuntu.com)|185.125.190.40|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1023925356 (976M) [application/x-xz]
Remote file exists.

wget --spider https://developer.nvidia.com/downloads/embedded/l4t/r35_release_v3.1/sd_card_b49/jp511-orin-nano-sd-card-image.zip
Spider mode enabled. Check if remote file exists.
--2023-05-05 13:00:27--  https://developer.nvidia.com/downloads/embedded/l4t/r35_release_v3.1/sd_card_b49/jp511-orin-nano-sd-card-image.zip
Resolving developer.nvidia.com (developer.nvidia.com)... 152.199.20.126
Connecting to developer.nvidia.com (developer.nvidia.com)|152.199.20.126|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
Remote file does not exist -- broken link!!!

But even comparing e.g. 2*(base_image size) > available size after downloading it would help.
Or check it with

unzip -l jp511-orin-nano-sd-card-image.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
22144876544  2023-03-20 04:38   sd-blob.img
---------                     -------
22144876544                     1 file

What do you think?

Nvidia images are large and GitHub runners have less available free space, so the test partitions was broken. I fixed it by deleting stuff...