plumed/plumed2

Issues with conda build

GiovanniBussi opened this issue · 1 comments

Hi all, this is just to notify that I am trying to fix the bug currently making the conda build for linux not functional. This is high priority since with this we cannot recompile the code for the cecam tutorial.

Meanwhile, the related GitHub Actions job will continue to fail, don't worry about it

I think I fixed this. I write some technical note to keep track of the problem.

When libplumedKernel.so is loaded, we use the option RTLD_DEEPBIND which makes sure that all the libraries used by plumed look for functions in those same libraries linked by plumed. This is failing with a segmentation fault in the following combination (both conditions need to be true):

  1. plumed is loaded from python
  2. plumed is compiled in conda, since a couple of weeks.

The second thing is I guess is a consequence of the switch to gcc 13, but I am not sure. So, I don't know if it will happen with gcc 13 outside of conda. Technically, what happens is that libgomp.so looks for a couple of symbols (stderr and environ) in a /lib64/libc.so.6 instead of looking for them in the python executable. Debugging this was not easy (I had to run with LD_DEBUG=all and follow all the loading process). For mysterious reasons, this leads to a segmentation fault when loading libgomp.so.

I just pushed a commit that fixes the build (3120675). To do so:

  • I add the possibility to switch off RTLD_DEEPBIND also when loading plumed from python. This is a change needed in the python wrappers.
  • I actually switch it off with an environment variable when loading from python (PLUMED_LOAD_NODEEPBIND=yes).

I suspect this is actually a bug in the conda build, so I prefer not to hard code this choice. We can then document that when using plumed from conda it is necessary to set the PLUMED_LOAD_NODEEPBIND environment variable.

However, for this environment variable to be effective, it is necessary to recompile the python wrapper. Assuming that people are using consistently conda and pip (i.e.: if they use conda for plumed, they also use it for the python wrapper), I would proceed as follows:

  • I will force a new build of the python wrappers on conda forge, without a new release, just including this single change. I might even force that when plumed python wrappers from plumed are used the environment variable is set automatically.

  • I will also trigger a new build of our own conda distribution, which will be necessary for the cecam school.

  • The fix in the C code that allows disabling RTLD_DEEPBIND will appear in 2.8.4 and 2.9.1, later on this year. No need for a special release.

I think I'll do it next week, so if there's any feedback before let me know.