Transaction Check Error With "dev-pkgs" and "cuda-curand"/"cuda-nvrtc" on "dunfell-l4t-r32.4.3"
ams-tech opened this issue · 15 comments
I'm getting a build error on the latest "dunfell-l4t-r32.4.3" branch. I can recreate the error with the following steps:
- Pull the latest version of the branch (and update the submodules)
- Setup the environment with
. ./setup-env --machine jetson-tx2 --distro tegrademo-mender
- Modify
build/conf/local.conf
with the following:- Add
IMAGE_INSTALL_append = "cuda-nvrtc cuda-curand"
- Change EXTRA_IMAGE_FEATURES to
EXTRA_IMAGE_FEATURES ?= "debug-tweaks dev-pkgs"
- Add
- Execute
bitbake core-image-base
The build fails at the core-image-base-1.0-r0 do_rootfs
step; logs are attached: log.do_rootfs.txt
I can get around this issue by switching poky
to a previous commit, e32d854e33bc
.
Please let me know if I can provide any more information to help.
Looks like maybe we need to fix the ownership and/or permissions bits on the ${libdir}/pkgconfig
during the do_install step in those cuda recipes (and possibly others). RPM is very fussy about having ownership and perms match exactly.
Could you file an issue over in meta-tegra for this?
Hmm. I just ran a test build including the packages that were flagged in your build, and it all went fine. Can you check that the contents of the RPMs for those packages have the right ownership and permissions for /usr/lib/pkgconfig
? For example:
$ rpm -qp tmp/deploy/rpm/aarch64/libgcrypt-dev-1.8.5-r0.aarch64.rpm --dump | grep /usr/lib/pkgconfig
/usr/lib/pkgconfig 0 1567084328 0000000000000000000000000000000000000000000000000000000000000000 040755 root root 0 0 0 X
/usr/lib/pkgconfig/libgcrypt.pc 555 1567084328 c8037c6f389af70e8fad6a4e590e6e0b8915169c32ab4fb78f710e9b96e69df2 0100644 root root 0 0 0 X
$ rpm -qp tmp/deploy/rpm/armv8a_tegra/cuda-curand-dev-10.2.89+1-r0.armv8a_tegra.rpm --dump | grep /usr/lib/pkgconfig
/usr/lib/pkgconfig 0 1572384154 0000000000000000000000000000000000000000000000000000000000000000 040755 root root 0 0 0 X
/usr/lib/pkgconfig/curand-10.2.pc 235 1572384154 6837f6ae39c63fe3d78f30124a71d6787cb85bf30609145a7b70f56da0f08f84 0100644 root root 0 0 0 X
The perms (040755
) match and the owner is root root
in each case.
If your copies aren't matching, that could be due to a pseudo issue. I've run into that myself, occasionally.
The first time I tried to build this combination (on what I believe is the same server @ams-tech is using) I got a different error at log.do_rootfs.txt
Installing : udev-hwdb-1:244.5-r0.aarch64 41/237
Running scriptlet: udev-hwdb-1:244.5-r0.aarch64 41/237
%post(udev-hwdb-1:244.5-r0.aarch64): scriptlet start
%post(udev-hwdb-1:244.5-r0.aarch64): execv(/bin/sh) pid 29567
+ set -e
+ test -n /build/tegra-demo-distro-dan/build/tmp/work/jetson_tx2-oe4t-linux/core-image-base/1.0-r0/rootfs
+ /build/tegra-demo-distro-dan/build/tmp/work/jetson_tx2-oe4t-linux/core-image-base/1.0-r0/intercept_scripts-ca2e37e1c4c5c23eac85dfd25f0482df8d261c9d9080a2cf7aad68d291c33bbf/postinst_intercept update_udev_hwdb udev-hwdb mlprefix= binprefix= rootlibexecdir=/lib PREFERRED_PROVIDER_udev=systemd
/var/tmp/rpm-tmp.jRsRyp: 5: /var/tmp/rpm-tmp.jRsRyp: /build/tegra-demo-distro-dan/build/tmp/work/jetson_tx2-oe4t-linux/core-image-base/1.0-r0/intercept_scripts-ca2e37e1c4c5c23eac85dfd25f0482df8d261c9d9080a2cf7aad68d291c33bbf/postinst_intercept: Permission denied
%post(udev-hwdb-1:244.5-r0.aarch64): waitpid(29567) rc 29567 status 7e00
warning: %post(udev-hwdb-1:244.5-r0.aarch64) scriptlet failed, exit status 126
Installing : libglib-2.0-0-1:2.62.6-r0.aarch64 97/237
Running scriptlet: libglib-2.0-0-1:2.62.6-r0.aarch64 97/237
%post(libglib-2.0-0-1:2.62.6-r0.aarch64): scriptlet start
%post(libglib-2.0-0-1:2.62.6-r0.aarch64): execv(/bin/sh) pid 31333
+ set -e
+ [ x/build/tegra-demo-distro-dan/build/tmp/work/jetson_tx2-oe4t-linux/core-image-base/1.0-r0/rootfs != x ]
+ /build/tegra-demo-distro-dan/build/tmp/work/jetson_tx2-oe4t-linux/core-image-base/1.0-r0/intercept_scripts-ca2e37e1c4c5c23eac85dfd25f0482df8d261c9d9080a2cf7aad68d291c33bbf/postinst_intercept update_gio_module_cache libglib-2.0-0 mlprefix= binprefix= libdir=/usr/lib libexecdir=/usr/libexec base_libdir=/lib bindir=/usr/bin
/var/tmp/rpm-tmp.sqHm3I: 6: /var/tmp/rpm-tmp.sqHm3I: /build/tegra-demo-distro-dan/build/tmp/work/jetson_tx2-oe4t-linux/core-image-base/1.0-r0/intercept_scripts-ca2e37e1c4c5c23eac85dfd25f0482df8d261c9d9080a2cf7aad68d291c33bbf/postinst_intercept: Permission denied
%post(libglib-2.0-0-1:2.62.6-r0.aarch64): waitpid(31333) rc 31333 status 7e00
warning: %post(libglib-2.0-0-1:2.62.6-r0.aarch64) scriptlet failed, exit status 126
ERROR: Postinstall scriptlets of ['udev-hwdb', 'libglib-2.0-0'] have failed. If the intention is to defer them to first boot,
then please place them into pkg_postinst_ontarget_${PN} ().
Permissions on postinst_intercept
look like they aren't conducive to executing:
ls -la /build/tegra-demo-distro-dan/build/tmp/work/jetson_tx2-oe4t-linux/core-image-base/1.0-r0/intercept_scripts-ca2e37e1c4c5c23eac85dfd25f0482df8d261c9d9080a2cf7aad68d291c33bbf/postinst_intercept
-rw-r--r--+ 1 yocto yocto 2359 Mar 13 16:17 /build/tegra-demo-distro-dan/build/tmp/work/jetson_tx2-oe4t-linux/core-image-base/1.0-r0/intercept_scripts-ca2e37e1c4c5c23eac85dfd25f0482df8d261c9d9080a2cf7aad68d291c33bbf/postinst_intercept
I also see a dump of warning messages which are probably related.
WARNING: core-image-base-1.0-r0 do_rootfs: copyfile: failed to chown/chmod /build/tegra-demo-distro-dan/build/tmp/work/jetson_tx2-oe4t-linux/core-image-base/1.0-r0/intercept_scripts-ca2e37e1c4c5c23eac85dfd25f0482df8d261c9d9080a2cf7aad68d291c33bbf/delay_to_first_boot ([Errno 1] Operation not permitted: '/build/tegra-demo-distro-dan/build/tmp/work/jetson_tx2-oe4t-linux/core-image-base/1.0-r0/intercept_scripts-ca2e37e1c4c5c23eac85dfd25f0482df8d261c9d9080a2cf7aad68d291c33bbf/delay_to_first_boot')
WARNING: core-image-base-1.0-r0 do_rootfs: copyfile: failed to chown/chmod /build/tegra-demo-distro-dan/build/tmp/work/jetson_tx2-oe4t-linux/core-image-base/1.0-r0/intercept_scripts-ca2e37e1c4c5c23eac85dfd25f0482df8d261c9d9080a2cf7aad68d291c33bbf/postinst_intercept ([Errno 1] Operation not permitted: '/build/tegra-demo-distro-dan/build/tmp/work/jetson_tx2-oe4t-linux/core-image-base/1.0-r0/intercept_scripts-ca2e37e1c4c5c23eac85dfd25f0482df8d261c9d9080a2cf7aad68d291c33bbf/postinst_intercept')
WARNING: core-image-base-1.0-r0 do_rootfs: copyfile: failed to chown/chmod /build/tegra-demo-distro-dan/build/tmp/work/jetson_tx2-oe4t-linux/core-image-base/1.0-r0/intercept_scripts-ca2e37e1c4c5c23eac85dfd25f0482df8d261c9d9080a2cf7aad68d291c33bbf/update_desktop_database ([Errno 1] Operation not permitted: '/build/tegra-demo-distro-dan/build/tmp/work/jetson_tx2-oe4t-linux/core-image-base/1.0-r0/intercept_scripts-ca2e37e1c4c5c23eac85dfd25f0482df8d261c9d9080a2cf7aad68d291c33bbf/update_desktop_database')
WARNING: core-image-base-1.0-r0 do_rootfs: copyfile: failed to chown/chmod /build/tegra-demo-distro-dan/build/tmp/work/jetson_tx2-oe4t-linux/core-image-base/1.0-r0/intercept_scripts-ca2e37e1c4c5c23eac85dfd25f0482df8d261c9d9080a2cf7aad68d291c33bbf/update_font_cache ([Errno 1] Operation not permitted: '/build/tegra-demo-distro-dan/build/tmp/work/jetson_tx2-oe4t-linux/core-image-base/1.0-r0/intercept_scripts-ca2e37e1c4c5c23eac85dfd25f0482df8d261c9d9080a2cf7aad68d291c33bbf/update_font_cache')
WARNING: core-image-base-1.0-r0 do_rootfs: copyfile: failed to chown/chmod /build/tegra-demo-distro-dan/build/tmp/work/jetson_tx2-oe4t-linux/core-image-base/1.0-r0/intercept_scripts-ca2e37e1c4c5c23eac85dfd25f0482df8d261c9d9080a2cf7aad68d291c33bbf/update_gio_module_cache ([Errno 1] Operation not permitted: '/build/tegra-demo-distro-dan/build/tmp/work/jetson_tx2-oe4t-linux/core-image-base/1.0-r0/intercept_scripts-ca2e37e1c4c5c23eac85dfd25f0482df8d261c9d9080a2cf7aad68d291c33bbf/update_gio_module_cache')
WARNING: core-image-base-1.0-r0 do_rootfs: copyfile: failed to chown/chmod /build/tegra-demo-distro-dan/build/tmp/work/jetson_tx2-oe4t-linux/core-image-base/1.0-r0/intercept_scripts-ca2e37e1c4c5c23eac85dfd25f0482df8d261c9d9080a2cf7aad68d291c33bbf/update_gtk_icon_cache ([Errno 1] Operation not permitted: '/build/tegra-demo-distro-dan/build/tmp/work/jetson_tx2-oe4t-linux/core-image-base/1.0-r0/intercept_scripts-ca2e37e1c4c5c23eac85dfd25f0482df8d261c9d9080a2cf7aad68d291c33bbf/update_gtk_icon_cache')
WARNING: core-image-base-1.0-r0 do_rootfs: copyfile: failed to chown/chmod /build/tegra-demo-distro-dan/build/tmp/work/jetson_tx2-oe4t-linux/core-image-base/1.0-r0/intercept_scripts-ca2e37e1c4c5c23eac85dfd25f0482df8d261c9d9080a2cf7aad68d291c33bbf/update_gtk_immodules_cache ([Errno 1] Operation not permitted: '/build/tegra-demo-distro-dan/build/tmp/work/jetson_tx2-oe4t-linux/core-image-base/1.0-r0/intercept_scripts-ca2e37e1c4c5c23eac85dfd25f0482df8d261c9d9080a2cf7aad68d291c33bbf/update_gtk_immodules_cache')
WARNING: core-image-base-1.0-r0 do_rootfs: copyfile: failed to chown/chmod /build/tegra-demo-distro-dan/build/tmp/work/jetson_tx2-oe4t-linux/core-image-base/1.0-r0/intercept_scripts-ca2e37e1c4c5c23eac85dfd25f0482df8d261c9d9080a2cf7aad68d291c33bbf/update_mime_database ([Errno 1] Operation not permitted: '/build/tegra-demo-distro-dan/build/tmp/work/jetson_tx2-oe4t-linux/core-image-base/1.0-r0/intercept_scripts-ca2e37e1c4c5c23eac85dfd25f0482df8d261c9d9080a2cf7aad68d291c33bbf/update_mime_database')
WARNING: core-image-base-1.0-r0 do_rootfs: copyfile: failed to chown/chmod /build/tegra-demo-distro-dan/build/tmp/work/jetson_tx2-oe4t-linux/core-image-base/1.0-r0/intercept_scripts-ca2e37e1c4c5c23eac85dfd25f0482df8d261c9d9080a2cf7aad68d291c33bbf/update_pixbuf_cache ([Errno 1] Operation not permitted: '/build/tegra-demo-distro-dan/build/tmp/work/jetson_tx2-oe4t-linux/core-image-base/1.0-r0/intercept_scripts-ca2e37e1c4c5c23eac85dfd25f0482df8d261c9d9080a2cf7aad68d291c33bbf/update_pixbuf_cache')
WARNING: core-image-base-1.0-r0 do_rootfs: copyfile: failed to chown/chmod /build/tegra-demo-distro-dan/build/tmp/work/jetson_tx2-oe4t-linux/core-image-base/1.0-r0/intercept_scripts-ca2e37e1c4c5c23eac85dfd25f0482df8d261c9d9080a2cf7aad68d291c33bbf/update_udev_hwdb ([Errno 1] Operation not permitted: '/build/tegra-demo-distro-dan/build/tmp/work/jetson_tx2-oe4t-linux/core-image-base/1.0-r0/intercept_scripts-ca2e37e1c4c5c23eac85dfd25f0482df8d261c9d9080a2cf7aad68d291c33bbf/update_udev_hwdb')
If I:
pushd repos/poky && git checkout e32d854e33bc && popd && \
pushd repos/meta-virtualization && git checkout 92cd3467502b && popd && \
pushd repos/meta-openembedded && git checkout f2d02cb71eaf && popd
I see this working log
log.do_rootfs.works.txt
and the executable permissions set on postinst_intercept
ls -la /build/tegra-demo-distro-dan-oldpoky/build/tmp/work/jetson_tx2-oe4t-linux/core-image-base/1.0-r0/intercept_scripts-43d453e16325e94d0cda3a1c92fc16f96831432f122215e14f0a182a6a305e69/postinst_intercept
-rwxr-xr-x+ 1 yocto yocto 2359 Mar 13 19:09 /build/tegra-demo-distro-dan-oldpoky/build/tmp/work/jetson_tx2-oe4t-linux/core-image-base/1.0-r0/intercept_scripts-43d453e16325e94d0cda3a1c92fc16f96831432f122215e14f0a182a6a305e69/postinst_intercept
I'll need to dig a bit more to figure out where/how these permissions are supposed to be set. I'm guessing this isn't meta-tegra related though, either something related to open embedded content or our build server setup or a permutation of the two.
The error above mysteriously disappeared for me today and I can't reproduce this anymore. I'm not sure what was happening there.
I see why I didn't replicate the transaction check error before. I was missing the edit for dev-pkgs
in EXTRA_IMAGE_FEATURES
, I tried to add to the bottom of build/local.conf
file instead which was just ignored since the variable was already defined above without it.
When I add dev-pkgs
I reproduce the same issue and I see that the permisisons from rpm don't match
yocto@yocto:/build/tegra-demo-distro-dan-2/build$ rpm -qp tmp/deploy/rpm/aarch64/libgcrypt-dev-1.8.5-r0.aarch64.rpm --dump | grep /usr/lib/pkgconfig
/usr/lib/pkgconfig 0 1567084328 0000000000000000000000000000000000000000000000000000000000000000 040755 root root 0 0 0 X
/usr/lib/pkgconfig/libgcrypt.pc 555 1567084328 c8037c6f389af70e8fad6a4e590e6e0b8915169c32ab4fb78f710e9b96e69df2 0100644 root root 0 0 0 X
yocto@yocto:/build/tegra-demo-distro-dan-2/build$ rpm -qp tmp/deploy/rpm/armv8a_tegra/cuda-curand-dev-10.2.89+1-r0.armv8a_tegra.rpm --dump | grep /usr/lib/pkgconfig
/usr/lib/pkgconfig 0 1572384154 0000000000000000000000000000000000000000000000000000000000000000 040775 root root 0 0 0 X
/usr/lib/pkgconfig/curand-10.2.pc 235 1572384154 6837f6ae39c63fe3d78f30124a71d6787cb85bf30609145a7b70f56da0f08f84 0100674 root root 0 0 0 X
So the difference is on the cuda-curand-dev-10.2.89+1-r0.armv8a_tegra.rpm
package we have /usr/lib/pkgconfig
with 040775 and with libgcrypt-dev-1.8.5-r0.aarch64.rpm
we have /usr/lib/pkgconfig
with 040755
If your copies aren't matching, that could be due to a pseudo issue.
If I bitbake -c devshell core-image-base
I get:
root@yocto:/build/tegra-demo-distro-dan-2/build/tmp/work/jetson_tx2-oe4t-linux/core-image-base/1.0-r0/core-image-base-1.0# which pseudo
/build/tegra-demo-distro-dan-2/build/tmp/work/jetson_tx2-oe4t-linux/core-image-base/1.0-r0/recipe-sysroot-native/usr/bin/pseudo
md5sum /build/tegra-demo-distro-dan-2/build/tmp/work/jetson_tx2-oe4t-linux/core-image-base/1.0-r0/recipe-sysroot-native/usr/bin/pseudo
71be18e9cde6a1702b61170c14a26197 /build/tegra-demo-distro-dan-2/build/tmp/work/jetson_tx2-oe4t-linux/core-image-base/1.0-r0/recipe-sysroot-native/usr/bin/pseudo
If I borrow the pseudo from old/working poky:
md5sum /build/tegra-demo-distro-dan-oldpoky/build/tmp/work/jetson_tx2-oe4t-linux/core-image-base/1.0-r0/recipe-sysroot-native/usr/bin/pseudo
4e5207fb04b82f5b91e00ce1721e6a0a /build/tegra-demo-distro-dan-oldpoky/build/tmp/work/jetson_tx2-oe4t-linux/core-image-base/1.0-r0/recipe-sysroot-native/usr/bin/pseudo
then bitbake -c cleanall cuda-curand && bitbake core-image-base
I still get the same transaction error and the same 775 permisions on /usr/lib/pkgconfig
from cuda-curand-dev
rpm -qp tmp/deploy/rpm/armv8a_tegra/cuda-curand-dev-10.2.89+1-r0.armv8a_tegra.rpm --dump | grep /usr/lib/pkgconfig
/usr/lib/pkgconfig 0 1572384154 0000000000000000000000000000000000000000000000000000000000000000 040775 root root 0 0 0 X
/usr/lib/pkgconfig/curand-10.2.pc 235 1572384154 6837f6ae39c63fe3d78f30124a71d6787cb85bf30609145a7b70f56da0f08f84 0100674 root root 0 0 0 X
OK, that's odd. I also see the .pc
files themselves have 0674
perms, rather than 0644
. The recipe uses cp --preserve=mode
to stage the files from the NVIDIA deb package during the do_install step, and the mode (perms) settings on the source files are:
drwxr-xr-x root/root 0 2019-10-29 14:21 ./usr/lib/pkgconfig/
-rw-r--r-- root/root 258 2019-10-29 14:21 ./usr/lib/pkgconfig/curand-10.2.pc
Which are 0755 and 0644. Not sure where along the way those permissions are getting lost/overwritten.
What's the host OS, and what is your umask when you run builds? Also, do you have selinux enabled?
@madisongh
Host OS:
siddhant@yocto:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 18.04.5 LTS
Release: 18.04
Codename: bionic
UMASK
yocto@yocto:/build/tegra-demo-distro-dan$ umask -S
u=rwx,g=rwx,o=rx
We don't have selinux installed on the build server however AppArmor is running.
@jajoosiddhant Thanks. I just tried a from-scratch build of cuda-curand on an 18.04 system and it came out OK, too.
A couple more questions:
- Are you guys up-to-date with the latest? There have been a number of fixes for pseudo that have gone into OE-Core recently, and the latest tegra-demo-distro is up-to-date with those.
- It doesn't look like it, but just to check - are you using a shared-state mirror? If so, make sure you're not picking up the sstate for the faulty packages from there.
@madisongh
I was just able to get rid of the error by adding DIRFILES="1"
in meta-tegra/recipes-devtools/cuda/cuda-curand_10.2.89-1.bb
and meta-tegra/recipes-devtools/cuda/cuda-nvrtc_10.2.89-1.bb
See https://www.yoctoproject.org/pipermail/yocto/2017-June/036455.html
and https://stackoverflow.com/a/44763692
Are you guys up-to-date with the latest? There have been a number of fixes for pseudo that have gone into OE-Core recently, and the latest tegra-demo-distro is up-to-date with those.
We are up to date with origin/dunfell-l4t-r32.4.3
It doesn't look like it, but just to check - are you using a shared-state mirror? If so, make sure you're not picking up the sstate for the faulty packages from there.
We shouldn't be using it as all relevant shared-state mirror lines in local.conf
are commented.
Interesting... DIRFILES is a new one on me. I suspect, though, that that setting isn't really required (as mentioned in the e-mail thread, if it were, then lots of other packages would also need it). Adding it caused the recipes to be rebuilt (well, repackaged) from scratch again, so any stray sstate that may have had the bad contents didn't get reused.
If that would have been the case then, building again after removing DIRFILES
should not give me an error but it did output the same error.
Error: Transaction check error:
file /usr/lib/pkgconfig conflicts between attempted installs of cuda-curand-dev-10.2.89+1-r0.armv8a_tegra and libgcrypt-dev-1.8.5-r0.aarch64
file /usr/lib/pkgconfig conflicts between attempted installs of libubootenv-dev-0.3.1-r0.jetson_tx2 and cuda-nvrtc-dev-10.2.89+1-r0.armv8a_tegra
If that would have been the case then, building again after removing DIRFILES should not give me an error but it did output the same error.
It will if you're reusing sstate from a build that has the problematic contents. You have to be absolutely sure that you're rebuilding from scratch.
This is the summary of all the methods I tried to locate the issue:
To ensure clean build I followed the following steps before each build on the build-server
bitbake -c cleansstate cuda-curand cuda-nvrtc libubootenv libgcrypt
bitbake -c cleanall cuda-curand cuda-nvrtc libubootenv libgcrypt
bitbake -c cleanall core-image-base
rm -rf cache/ sstate-cache/
bitbake core-image-base
First build with no DIRFILES
- error in build
Second build with DIRFILES
- Successful build
Third build with no DIRFILES
- error in build
We were able to build successfully on a different machine without using DIRFILES
.
@ams-tech tried dockerizing the build and ran on both the build server (the machine giving build issues) and a different machine. The build inside docker container failed for the build server but passed on the other machine without using DIRFILES
.
I wonder what is different on the build server that makes it even fail inside the docker container.
Another question to look at would be why DIRFILES
makes the build successful.
I wonder what is different on the build server that makes it even fail inside the docker container.
Maybe an issue with the filesystem itself, or something in the kernel running on that system? Have you run fsck
on the filesystem, and kept the kernel up-to-date? Those would remain the same regardless of whether you're running inside or outside a container.
Another question to look at would be why DIRFILES makes the build successful.
It works around the issue by not including the parent directory in the RPM package, so there's no permissions conflict when it gets installed into the rootfs.
Thanks @madisongh for the suggestions.
I can make the problems go away if I:
sudo setfacl -b -R /build/tegra-demo-distro-dan
sudo chown -R yocto:yocto /build/tegra-demo-distro-dan
Then build from the yocto account.
Previously our workflow has been to perform builds under one account, which might be a different account than we used for cloning/editing. It appears something has broken with this recently, although I don't fully understand how this ever would have worked previously based on the logic at utils.py which chown
s the file based on src directories which would be owned by the user who git clone
s and the fact that chown doesn't work on non root accounts which don't own the file.
So the workflow for us going forward is going to be to always run clones/builds under the same account and I expect these issues are going to disappear.