FairRootGroup/FairSoft

[nov22]: ROOT compilation fails on Debian11

fuhlig1 opened this issue · 12 comments

With the latest version of Debian11 I the build of ROOT fails with the error below. The compilation was working last week and since I don't see any changes on our side it is probably related to a change of the OS itself.

gmake[4]: *** [Makefile:156: all] Error 2

CMake Error at /opt/fairsoft/source/FairSoft/build/Stamp/root/root-build-RelWithDebInfo.cmake:47 (message):
  Stopping after outputting logs.


gmake[3]: *** [CMakeFiles/root.dir/build.make:85: Stamp/root/root-build] Error 1
gmake[2]: *** [CMakeFiles/Makefile2:3450: CMakeFiles/root.dir/all] Error 2
gmake[1]: *** [CMakeFiles/Makefile2:3457: CMakeFiles/root.dir/rule] Error 2
gmake: *** [Makefile:1694: root] Error 2

Checking in more detail it turns out that the compilation of LZMA fails

Singularity> cd build/Build/root/
Singularity> make
[  0%] Performing build step for 'LZMA'
CMake Error at /opt/fairsoft/source/FairSoft/build/Build/root/LZMA-prefix/src/LZMA-stamp/LZMA-build-RelWithDebInfo.cmake:49 (message):
  Command failed: 2

   'make'

  See also

    /opt/fairsoft/source/FairSoft/build/Build/root/LZMA-prefix/src/LZMA-stamp/LZMA-build-*.log


make[2]: *** [CMakeFiles/LZMA.dir/build.make:86: LZMA-prefix/src/LZMA-stamp/LZMA-build] Error 1
make[1]: *** [CMakeFiles/Makefile2:7674: CMakeFiles/LZMA.dir/all] Error 2
make: *** [Makefile:156: all] Error 2

with the following underlying error

Singularity> more /opt/fairsoft/source/FairSoft/build/Build/root/LZMA-prefix/src/LZMA-stamp/LZMA-build-*.log
::::::::::::::
/opt/fairsoft/source/FairSoft/build/Build/root/LZMA-prefix/src/LZMA-stamp/LZMA-build-err.log
::::::::::::::
/opt/fairsoft/source/FairSoft/build/Build/root/LZMA-prefix/src/LZMA/build-aux/missing: line 81: aclocal-1.15: command not found
WARNING: 'aclocal-1.15' is missing on your system.
         You should only need it if you modified 'acinclude.m4' or
         'configure.ac' or m4 files included by 'configure.ac'.
         The 'aclocal' program is part of the GNU Automake package:
         <http://www.gnu.org/software/automake>
         It also requires GNU Autoconf, GNU m4 and Perl in order to run:
         <http://www.gnu.org/software/autoconf>
         <http://www.gnu.org/software/m4/>
         <http://www.perl.org/>
make[3]: *** [Makefile:514: aclocal.m4] Error 127

which is probably correct since aclocal 1.16 is installed on the system.

I was not able to fix the installation error so I simply installed liblzma on the system such that ROOT could use that version and don't need to compile its internal version. With this change the problem disappears.

I will create a PR which adds the missing package to legacy/setup-debian.sh.

I will create a PR which adds the missing package to legacy/setup-debian.sh.

👍 thx

This is the reason why I didn't see it, docker://debian:11 installs this package by default:

❯ apptainer exec /cvmfs/fairsoft_dev.gsi.de/ci/for-fairsoft/latest/container/debian.11.legacy.sif bash -l -c "dpkg -s liblzma5 | grep Status"
Status: install ok installed

/opt/fairsoft/source/FairSoft/build/Build/root/LZMA-prefix/src/LZMA/build-aux/missing: line 81: aclocal-1.15: command not found

But I wonder how it gets the idea to use 1.15? How did you create your debian 11 image?

But I wonder how it gets the idea to use 1.15? How did you create your debian 11 image?

This is also completely unclear for me. I download the initial docker container from docker hub.

Probably the configure script of LZMA was generated using aclocal 1.15 and now when running make the autotools decide for whatever reason that they again need aclocal 1.15.

I will try to have a look into the problem when I have more time. For the time being installing lzma on the system fixes the problem for me and using the system installation make sense for me anyway.

Can you run ls -la /etc/alternatives/aclocal in your images for comparison?

❯ apptainer exec /cvmfs/fairsoft_dev.gsi.de/ci/for-fairsoft/latest/container/debian.11.legacy.sif bash -l -c "ls -la /etc/alternatives/aclocal"
lrwxrwxrwx. 1 dklein dklein 21 Nov 24 16:39 /etc/alternatives/aclocal -> /usr/bin/aclocal-1.16

Can you run ls -la /etc/alternatives/aclocal in your images for comparison?

❯ apptainer exec /cvmfs/fairsoft_dev.gsi.de/ci/for-fairsoft/latest/container/debian.11.legacy.sif bash -l -c "ls -la /etc/alternatives/aclocal"
lrwxrwxrwx. 1 dklein dklein 21 Nov 24 16:39 /etc/alternatives/aclocal -> /usr/bin/aclocal-1.16

Yes I know. This is exactly what I also had in my container. I found some people also reporting about the problems. One of the issues was wrong timestamps of the files needed to generate the configure file. If the configure file is for whatever reson older than the input files it is tried to recreate the configure file. Currently I can't check if this is the case for the code used by ROOT.

There is a tar archive with the needed code inside of the ROOT repository at root/core/lzma/src. After unpacking the archive and searching for 1.15 I find the following files.

Makefile.in:# Makefile.in generated by automake 1.15.1 from Makefile.am.
aclocal.m4:# generated automatically by aclocal 1.15.1 -*- Autoconf -*-
aclocal.m4:[am__api_version='1.15'
aclocal.m4:m4_if([$1], [1.15.1], [],
aclocal.m4:[AM_AUTOMAKE_VERSION([1.15.1])dnl
configure:am__api_version='1.15'

So the build system was generated from the input files using automake (aclocal) 1.1.5. Somehow in my case obviously a regeneration is triggered. In the moment I have no clue why.

from a build from this weekend on our CI image for Debian 11:

https://alfa-ci.gsi.de/job/FairRootGroup/job/FairSoft/job/legacy/1/artifact/logs/Debian-11/root-configure.log shows

-- Looking for LZMA
-- Could NOT find LibLZMA (missing: LIBLZMA_LIBRARY LIBLZMA_INCLUDE_DIR LIBLZMA_HAS_AUTO_DECODER LIBLZMA_HAS_EASY_ENCODER LIBLZMA_HAS_LZMA_PRESET) 
-- LZMA not found. Switching on builtin_lzma option
-- Building LZMA version 5.2.4 included in ROOT itself

and then in https://alfa-ci.gsi.de/job/FairRootGroup/job/FairSoft/job/legacy/1/artifact/logs/Debian-11/root-build.log:

Scanning dependencies of target LZMA
[  0%] Creating directories for 'LZMA'
[  0%] Performing download step (verify and extract) for 'LZMA'
-- LZMA download command succeeded.  See also /tmp/jenkins-FairRootGroup-FairSoft-legacy-1.g8j/build/Build/root/LZMA-prefix/src/LZMA-stamp/LZMA-download-*.log
[  0%] No update step for 'LZMA'
[  0%] No patch step for 'LZMA'
[  0%] Performing configure step for 'LZMA'
-- LZMA configure command succeeded.  See also /tmp/jenkins-FairRootGroup-FairSoft-legacy-1.g8j/build/Build/root/LZMA-prefix/src/LZMA-stamp/LZMA-configure-*.log
[  2%] Performing build step for 'LZMA'
-- LZMA build command succeeded.  See also /tmp/jenkins-FairRootGroup-FairSoft-legacy-1.g8j/build/Build/root/LZMA-prefix/src/LZMA-stamp/LZMA-build-*.log
[ 24%] Performing install step for 'LZMA'
-- LZMA install command succeeded.  See also /tmp/jenkins-FairRootGroup-FairSoft-legacy-1.g8j/build/Build/root/LZMA-prefix/src/LZMA-stamp/LZMA-install-*.log
[ 24%] Completed 'LZMA'

How can that work there and not in your image? When did you create it? (I created it on friday)

As I already said when testing the nov22rc branch I did not see the problem. The last container I have created dates back to 06:02 GMT+0100 on 2022-11-24 (from the container registry). I think the container was created after the change of the ROOT version but I have to check it. According to the logs there were no major changes afterwards. I still have to check if your Python 3 patches have an effect but I doubt so.

Since I always create the container from the base image an do an "apt update && apt upgrade" afterwards there is the chance that there were changes in Debian. Unfortunately I don't know how to track such changes.

  1. Are you running the installation scripted from a container definition file
  2. or do you install FairSoft into the image interactively?
    1. And if yes, do you work on a login shell
    2. or non-login shell?

I wonder whether the host shell environment can somehow influence the build.

Okay meanwhile I found the underlying problem.

The reason is a change in CMake with version 3.24. In the bootstrap_cmake.sh script the new version is 3.24.3, before it was 3.22.3. Since I install CMake in the container using the script this explains why the build suddenly fails. When going back to CMake 3.22.3 the crash during the ROOT build disappears.

With CMake 3.24 the CMake command ExternalProject_Add() changes the handling of timestamps when unpacking a tar archive. The old behaviour was to keep the timestamps from the tar file, the new behaviour is to update the timestamps of the files as described here.

CMake policy 135

CMake and ROOt already realised that this is a problem, unfortunately I found the bug reports only after I found the problem myself.

ROOT Issue tracker
CMake Issue tracker

There is already a fix in the v6-26-00-patches branch of ROOT, unfortunately only after the relaese was done.

Bugfix

Since this isn't only a problem for Debian11 but will show up on all systems if a new CMake is used I propose to add the patch to FairSoft.

I propose to add the patch to FairSoft.

👍

Fixed with #488 in the development branch. Ported also to the nov22_patches branch.