bsc-wdc/compss

Build of bindings-common fails because of early file removal

quentin-ag opened this issue · 5 comments

Level

Minor

Component

bindings-common (build)

Environment

  • COMPSs version: 2.7
  • Java version: 1.8.0
  • Python version: 3.8.5
  • GCC version: 10.2.0
  • libtool version: 2.4.6
  • Operating System: Linux (Manjaro)[1]

[1] successfully reproduced on Debian 10, CentOS 7 and Arch Linux.

Description

The build process of bindings-common fails because a necessary file cannot be found. The logs show that it is removed too early.
This issue seems similar to #1, although I do not see how they could have the same cause.

Minimal example to reproduce

cd ${compss_src}/builders
./buildlocal ${compss_target}

The error can then be reproduced with the same command as executed by buildlocal:

cd ${compss_src}/builders/tmp/compss/programming_model/bindings/bindings-common
./install_common "${compss_target}/Bindings/bindings-common"

Exception

The script ${compss_src}/compss/programming_model/bindings/bindings-common/install-common – called by ${compss_src}/builders/buildlocal – exits during the make clean install instruction with the error message

libtool:   error: 'libbindings_common_la-BindingDataManager.lo' is not a valid libtool object

Below is the relevant excerpt from the full output [2]:

/bin/sh ../libtool  --tag=CXX   --mode=link g++  -g -O2 -shared -L/usr/lib/jvm/java-8-openjdk/jre/lib/amd64/server -ljvm  -o libbindings_common.la -rpath ${compss_target}/Bindings/bindings-common/lib libbindings_common_la-BindingDataManager.lo libbindings_common_la-BindingExecutor.lo libbindings_common_la-JavaNioConnStreamBuffer.lo libbindings_common_la-AbstractCache.lo libbindings_common_la-compss_worker.lo libbindings_common_la-common.lo libbindings_common_la-GS_compss.lo  
libtool:   error: 'libbindings_common_la-BindingDataManager.lo' is not a valid libtool object
make[1]: *** [Makefile:436: libbindings_common.la] Error 1
make[1]: Leaving directory '${compss_src}/builders/tmp/compss/programming_model/bindings/bindings-common/src'
make: *** [Makefile:404: install-recursive] Error 1

The error message occasionally mentions another libbindings_common_la_*.lo file instead of libbindings_common_la-BindingDataManager.lo.

N.B. I have overwritten some file paths, such as ${compss_src}.

[2] Output of

cd ${compss_src}/builders/tmp/compss/programming_model/bindings/bindings-common
./install_common "${compss_target}/Bindings/bindings-common"

Expected behaviour and workaround

The build should not fail. Supposedly, libbindings_common_la-BindingDataManager.lo should not be removed until it is no longer necessary.

I worked around the issue by not removing any of these files:

diff --git a/compss/programming_model/bindings/bindings-common/src/Makefile.am b/compss/programming_model/bindings/bindings-common/src/Makefile.am
index 529340b5b..84ab6f144 100644
--- a/compss/programming_model/bindings/bindings-common/src/Makefile.am
+++ b/compss/programming_model/bindings/bindings-common/src/Makefile.am
@@ -25,4 +25,4 @@ libbindings_common_la_LDFLAGS = -shared -L$(JAVA_LIB_DIR) -ljvm
 ACLOCAL_AMFLAGS =-I m4

 clean:
-	rm -f *.o *.lo *~
+	rm -f *.o *~

This modification solved the issue for me. However, it does not look like a clean fix.

Looking at the full log. It seems the installation is done twice! First one is working second one is failing.

Your observation is right. Unfortunately, the log file was wrong.

I have just reproduced the error, and what I get is only the second part. I must have made a mistake when I captured the log, and appended to the log file instead of overwriting it (>> instead of >), I imagine.

I have updated the link to point to a new, correct log file (of the same command [2]). I apologise for the mistake.

Do you have a portable environment where I can test this installation with the OS which is failing (such as a docker image)?

Unfortunately, no. I have tried to reproduce the issue in an environment that you could use, especially with the same GCC and libtool versions, but the problem did not appear.

I have tried again with the latest version of branch 2.7 (commit b2e235f). The issue is unchanged on my Arch-based distributions (Manjaro and Arch Linux) and the workaround still works. There is no such problem on my Debian and CentOS environment, but I am not sure what to deduce from this.

At the moment I do not have much time to try to reproduce the issue in a portable environment. I propose to keep this ticket open so that I can give updates when I have more information, or if the situation changes.

I have seen an official arch-linux docker image. I was looking a Manjaro one. I will try to install there to see if the problem is happening there.