openbmc/phosphor-host-ipmid

Presubmits for repos dependent on phosphor-host-ipmid failing intermittently

brandonkimbk opened this issue · 8 comments

It seems like phosphor-ipmi-flash, which depends on phosphor-host-ipmid, fails intermittently.

This is an example gerrit code review:
https://gerrit.openbmc-project.xyz/c/openbmc/phosphor-ipmi-flash/+/38041

The Jenkins presubmit failed: https://jenkins.openbmc.org/job/ci-repository/7829/

In case the Jenkins link disappears, the log towards the end looks like this:

make  all-recursive
Making all in include
make[2]: Nothing to be done for 'all'.
Making all in libipmid
make[2]: Nothing to be done for 'all'.
Making all in libipmid-host
make[2]: Nothing to be done for 'all'.
Making all in user_channel
make[2]: Nothing to be done for 'all'.
Making all in .
make[2]: Entering directory '/home/jenkins-op/workspace/ci-repository/openbmc/phosphor-host-ipmid'
/bin/bash ./libtool  --tag=CXX   --mode=link g++ -std=c++17 -flto -Wno-psabi   -I/usr/local/include -I/usr/local/include -I/usr/local/include -DBOOST_ERROR_CODE_HEADER_ONLY -DBOOST_SYSTEM_NO_DEPRECATED -DBOOST_COROUTINES_NO_DEPRECATION_WARNING -DBOOST_ASIO_DISABLE_THREADS -DBOOST_ALL_NO_LIB -g -O2 -Wall -Werror -lsystemd  -L/usr/local/lib/powerpc64le-linux-gnu -L/usr/local/lib -lphosphor_logging -lsdbusplus -lsystemd -lphosphor_dbus -L/usr/local/lib/powerpc64le-linux-gnu -lphosphor_dbus -version-info 0:0:0 -shared  -o libwhitelist.la -rpath /usr/local/lib/ipmid-providers libwhitelist_la-whitelist-filter.lo libwhitelist_la-ipmiwhitelist.lo  -lmapper -lpam 
libtool: link: g++ -std=c++17  -fPIC -DPIC -shared -nostdlib /usr/lib/gcc/powerpc64le-linux-gnu/10/../../../powerpc64le-linux-gnu/crti.o /usr/lib/gcc/powerpc64le-linux-gnu/10/crtbeginS.o  .libs/libwhitelist_la-whitelist-filter.o .libs/libwhitelist_la-ipmiwhitelist.o   -L/usr/local/lib/powerpc64le-linux-gnu -L/usr/local/lib /usr/local/lib/libphosphor_logging.so -lsdbusplus -lsystemd -lphosphor_dbus /usr/local/lib/libmapper.so -lpam -L/usr/lib/gcc/powerpc64le-linux-gnu/10 -L/usr/lib/gcc/powerpc64le-linux-gnu/10/../../../powerpc64le-linux-gnu -L/usr/lib/gcc/powerpc64le-linux-gnu/10/../../../../lib -L/lib/powerpc64le-linux-gnu -L/lib/../lib -L/usr/lib/powerpc64le-linux-gnu -L/usr/lib/../lib -L/usr/lib/gcc/powerpc64le-linux-gnu/10/../../.. -lstdc++ -lm -lc -lgcc_s /usr/lib/gcc/powerpc64le-linux-gnu/10/crtendS.o /usr/lib/gcc/powerpc64le-linux-gnu/10/../../../powerpc64le-linux-gnu/crtn.o  -flto -g -O2   -Wl,-soname -Wl,libwhitelist.so.0 -o .libs/libwhitelist.so.0.0.0
lto1: internal compiler error: bytecode stream: expected tag identifier_node instead of LTO_UNKNOWN
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-10/README.Bugs> for instructions.
lto-wrapper: fatal error: g++ returned 1 exit status
compilation terminated.
/usr/bin/ld: error: lto-wrapper failed
collect2: error: ld returned 1 exit status
make[2]: *** [Makefile:1124: libwhitelist.la] Error 1
make[2]: Leaving directory '/home/jenkins-op/workspace/ci-repository/openbmc/phosphor-host-ipmid'
make[1]: *** [Makefile:1490: all-recursive] Error 1
make: *** [Makefile:908: all] Error 2
Traceback (most recent call last):
  File "/home/jenkins-op/workspace/ci-repository/openbmc/unit-test.py", line 1180, in <module>
    build_and_install(dep, False)
  File "/home/jenkins-op/workspace/ci-repository/openbmc/unit-test.py", line 304, in build_and_install
    pkg.install()
  File "/home/jenkins-op/workspace/ci-repository/openbmc/unit-test.py", line 1019, in install
    system.build()
  File "/home/jenkins-op/workspace/ci-repository/openbmc/unit-test.py", line 681, in build
    check_call_cmd(*make_parallel)
  File "/home/jenkins-op/workspace/ci-repository/openbmc/unit-test.py", line 229, in check_call_cmd
    check_call(cmd)
  File "/usr/lib/python3.8/subprocess.py", line 364, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '('make', '-j', '80', '-l', '80', '-O')' returned non-zero exit status 2.
Traceback (most recent call last):
  File "/home/jenkins-op/workspace/ci-repository/openbmc/dbus-unit-test.py", line 91, in <module>
    check_call(UNIT_TEST.split(','), env=os.environ)
  File "/usr/lib/python3.8/subprocess.py", line 364, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/home/jenkins-op/workspace/ci-repository/openbmc/unit-test.py', '-w', '/home/jenkins-op/workspace/ci-repository/openbmc', '-p', 'phosphor-ipmi-flash', '-b', 'master', '-v']' returned non-zero exit status 1.
Build step 'Execute shell' marked build as failure
New run name is '#7829-Jason Ling'
Finished: FAILURE```

At first we thought it was ppc64le specific but we also see it on x86. @williamspatrick has a theory that we're pulling in the latest gcc (10.2) but some of the package we're pulling in via docker have not been recompiled with the latest compiler so we end up with issues like https://bugs.gentoo.org/733886.

So we can maybe disable flto for now on this repo, pin the build-unit-test-docker.sh to 10.1, or wait for the upstream docker containers to figure it out.

I hit the same issue, and in my local ci, it could be workaround by cleanup the $WORKSPACE/phosphor-host-ipmid.

Although the error looks like related to lto, but my local ci builds fine with lto enabled. So I guess it was related to unclean repo of the dependent phosphor-host-ipmid.

The unclear repo portion doesn't track with the CI upstream experience as that checks out the source into a "clean" docker and compiles it every time the CI runs, so it's always clean. It never ever fails in this way running locally within docker on my workstation, but the idea that some packages are varying is an interesting theory.

Andrew, is there an effort to resolve this, we're effectively blocked on it now because it's happening pretty much guaranteed.

Checking the latest failures, it looks like that the failure ones gets corrupted source files, e.g.

https://jenkins.openbmc.org/job/ci-repository/8666/console (on builder4)

signals.cpp:76:34: error: ���signal_set��� is not a member of ���boost::asio���

https://jenkins.openbmc.org/job/ci-repository/8667/consoleFull (on builder3)

signals.cpp:76:34: error: �signal_set� is not a member of �boost::asio�

And on https://jenkins.openbmc.org/job/ci-repository/8697/console (builder3), it shows that phosphor-host-ipmid is not really clean:

/home/jenkins-slave/workspace/ci-repository/openbmc/phosphor-host-ipmid > make -j 40 -l 40 -O 
make  all-recursive
Making all in include
make[2]: Nothing to be done for 'all'.
Making all in libipmid
make[2]: Nothing to be done for 'all'.
Making all in libipmid-host
make[2]: Nothing to be done for 'all'.
Making all in user_channel
make[2]: Nothing to be done for 'all'.
...
lto1: internal compiler error: bytecode stream: expected tag identifier_node instead of LTO_UNKNOWN

I took a look at the workspace and def do see some of the dependent repos with older timestamps still there. I read over https://stackoverflow.com/questions/37540823/difference-between-delete-workspace-before-build-starts-and-wipe-out-reposito and am wondering if the fact we're checking out to a sub-directory is messing things up. I moved the job over from using the git version of the workspace clean up to the Workspace Clean Plugin version. Lets see if that helps.

https://gerrit.openbmc-project.xyz/c/openbmc/phosphor-ipmi-flash/+/38443/7 <-- this was fairly consistently failing before, so I think you've fixed the issue.

Sounds like the change in workspace cleanup plugin was the silver bullet on this one. Closing out.