eth-cscs/sarus

Avoiding ldconfig

haampie opened this issue · 4 comments

I've stumbled upon what is likely a long-standing issue in ldconfig that is blocking me to run it. It seems to be necessary for sarus run --mpi .. to work.

To reduce the size of my docker image I'm running a bundling tool that collects all executables in a bin/ folder and recursively all dependent shared libraries in a lib/ folder. Then it changes the RUNPATH of the binaries using the patchelf tool so that ldd can resolve them. Everything is fine, it seems I can run the binaries, however, ldconfig chokes on it in the following way:

root@afd0fbda1255:~# ldconfig ~/SIRIUS.AppDir/usr/lib/
/sbin/ldconfig.real: file /root/SIRIUS.AppDir/usr/lib/libgsl.so.23 is truncated
...

This is apparently a 10 year old issue (https://nix-dev.science.uu.narkive.com/q6Ww5fyO/ldconfig-problem-with-patchelf-and-64-bit-libs, NixOS/patchelf#44, mesonbuild/meson#4685, and more), and apparently everybody seems to be working around it.

Can we somehow avoid relying on ldconfig in sarus so that we don't run into this problem?

Since my executables are configured to use RUNPATH, it should be enough to mount MPI into the container and set an environment variable LD_LIBRARY_PATH=/path/to/mpi/lib, since that takes precedence.

Hello @haampie,
the motivation behind the use of ldconfig is that currently the functionality offered by sarus run --mpi is implemented via an OCI hook (https://sarus.readthedocs.io/en/stable/config/mpi-hook.html), which is equivalent to a standalone plugin program.

This program needs to find out autonomously if there are MPI libraries in the container, where they are and if they are fit to be replaced by host equivalents.
The hook thus uses /etc/ld.so.cache (generated through the use of ldconfig when building the image) to look for the MPI libraries across all those generally available in the image

This solution has the following advantages over relying on ELF's RPATH/RUNPATH:

  • It does not rely on having a target executable to read the directories from: it is not trivial to detect inside the container which is the actual MPI application that has the libraries linked.
    Even if the container engine propagated the container commands or the entrypoint to the hook, those could be (for example) scripts or driver programs which indirectly execute the MPI application; in this case the hook would be unable to access the correct RPATH/RUNPATH.

  • Using ldconfig in the Dockerfile is clearer and more flexible than instructing the linker to use specific paths.

That said, if you wish to avoid the use of ldconfig in your Dockerfile (hence being unable to use the MPI hook through --mpi), you have the following options:

  1. Running with the container's MPI libraries without leveraging the performance of a native interconnect (for some reference: https://sarus.readthedocs.io/en/latest/user/user_guide.html#running-mpi-applications-without-the-native-mpi-hook);

  2. Manually mounting the host MPI libraries and their dependencies through the --mount option to sarus run. You should look into the etc/sarus.json file of your Sarus installation to know which are the MPI libraries and MPI dependency libraries that the system administrator has configured, and that therefore you should to mount.

Thanks, makes sense.

How about this approach: add a flag --mpi-path /path/to/libmpi.so.x that allows you to append paths to the list that ldconfig returns (it would imply --mpi), and maybe you could set multiple of those --mpi-path /path-1/to/libmpi.so.x --mpi-path /path-2/to/libmpi.so.x.

Best!

One last comment, though, since I'm quite happy with my current approach of building small Docker images and relying on RUNPATH to do its job, maybe I should explain it a bit better.

It does not rely on having a target executable to read the directories from

I totally agree that this is the way to go for sure!

Using ldconfig in the Dockerfile is clearer and more flexible than instructing the linker to use specific paths.

True, but it still introduces a step in the Dockerfile that is sarus-runtime specific.

So, in my use-case I have a multistage build where the builder image is terribly big, but the resulting image is just a bare ubuntu 18:04 base image with a single folder called ~/MyApp.AppDir which has the following structure:

$ tree MyApp.AppDir/
MyApp.AppDir/
└── usr
    ├── bin
    │   └── app
    └── lib
        ├── libcublas.so
        ├── libmkl.so
        └── libmpi.so

Now the app executable has RUNPATH set to $ORIGIN/../lib and the binaries in lib/ have it set to $ORIGIN (this is automated by the bundling tool). This means that shared libs in lib/ are the preferred libs unless overriden by LD_LIBRARY_PATH.

What I could do now, is to create the following structure instead:

$ tree MyApp.AppDir/
MyApp.AppDir/
└── usr
    ├── bin
    │   └── app
    ├── lib
    │   ├── libcublas.so
    │   ├── libmkl.so
    │   └── libmpi.so
    └── vendor

and create a Dockerfile like this:

FROM ubuntu:18.04

COPY --from=builder ~/MyApp.AppDir ~/MyApp.AppDir

ENV PATH ~/MyApp.AppDir/usr/bin:$PATH

# Allow vendor-specific libraries to be loaded here
ENV LD_LIBRARY_PATH ~/MyApp.AppDir/usr/vendor

and run sarus like

sarus run --mount-vendor-mpi-to=~/MyApp.AppDir/usr/vendor/ my_image app

This approach has the same advantage of not having to infer the libmpi.so location from the executables, and I think it's very clear and flexible as well.


Lastly, if / when sarus would support extending existing variables, I could even drop the ENV setting from my Dockerfile, and have the hook do that for me, and my Dockerfile would not have to be aware of sarus at all.

Closing for now, since I've found a way to avoid using patchelf s.t. ldconfig is not broken anymore