mstorsjo/llvm-mingw

DLLs should be available via --print-file-name

nolange opened this issue · 8 comments

Hello, I use a script to to copy over dependent DLLs during the install step.

There are 2 places I look for them

  1. Search dirs for executables

    # /opt/llvm-mingw/bin/x86_64-w64-mingw32-clang++ -print-search-dirs
    programs: =/opt/llvm-mingw/bin
    libraries: =/opt/llvm-mingw/lib/clang/18::/opt/llvm-mingw/x86_64-w64-mingw32/lib:/opt/llvm-mingw/x86_64-w64-mingw32/mingw/lib
    
  2. Directly querying the compiler

    /opt/llvm-mingw/bin/x86_64-w64-mingw32-clang++ --print-file-name libwinpthread-1.dll

The llvm-mingw Toolchain fails to locate the DLLs because of that. It would work nicely if the DLLs would be available in one of the designated library paths (eg. /opt/llvm-mingw/x86_64-w64-mingw32/lib).

I understand that bin is the default path for installing, so that shared libraries you are building are picked up by applications. But for the toolchain this seems wrong.

I see what you're getting at...

Picking up and bundling the toolchain runtimes with a distributed app is always a bit of a hassle, as they are installed in wildly different between locations between toolchains.

I haven't run into the use of --print-file-name for this purpose before though - that does indeed seem like a kinda convenient solution, which also seems to work across both GCC and Clang.

My rationale for this layout, is that libraries are installed into <root>/<arch>-w64-mingw32, just like any regular install with --prefix=<root>/<arch>-w64-mingw32 would do, without distinction between what is a platform library and what is a user library.

But indeed, nothing ever uses those DLLs directly from the bin subdirectory there, except for existing convention of llvm-mingw having them there. Changing it could potentially break existing deployment scripts for all existing users.

But for the toolchain this seems wrong.

I don't entirely agree. Generally, I don't think the DLL files generally belong in the lib directory - the linker shouldn't need to look at them. However this does seem to be the current practice for e.g. debian shipped mingw cross compilers (since https://salsa.debian.org/mingw-w64-team/mingw-w64/-/commit/b47e81590d80367b312f437a1c839d434a96d3a3). It seems like GCC used to install its own runtimes in this fashion as well, but with recent versions, the GCC runtimes seem to be installed in the GCC resource directory (<root>/lib/gcc/<triple>/<version>).

I wonder if there are other ways of making clang find them in the bin subdirectory, for the --print-file-name case. That's probably possible, but it might also end up with Clang passing that directory to the linker (which would be harmless but unnecessary).

We could also duplicate the DLLs in both the bin and lib subdirectories. That works, doesn't break existing usage patterns, but also is quite ugly.

TL;DR, I'm not saying firmly no to moving them, but I'm not very keen on it. I would expect that it would break number of existing users that use the established practice of where to find them in llvm-mingw (although fixing it would be quite simple in each case). But having clang --print-filename able to locate them, would definitely be good, I do see the value in that.

I'm open to hearing the opinions of others!

I don't entirely agree. Generally, I don't think the DLL files generally belong in the lib directory

That can be argued, but I dont see an argument for putting them into bin (other than not changing things). In terms of the linker, the DLLs dont need to exist AFAIK.

There is also --print-prog-name which works on Arch Linux because the DLLs reside next to the compiler AFAIR, but that is an even bigger mess.

We could also duplicate the DLLs in both the bin and lib subdirectories. That works, doesn't break existing usage patterns, but also is quite ugly.

symlinks/hardlinks would save some space.

I don't entirely agree. Generally, I don't think the DLL files generally belong in the lib directory

That can be argued, but I dont see an argument for putting them into bin (other than not changing things).

The main argument is that I keep the <root>/<arch>-w64-mingw32 directory with the same layout as any installation of a library.

There is also --print-prog-name which works on Arch Linux because the DLLs reside next to the compiler AFAIR, but that is an even bigger mess.

And I guess that only would work for the cases when running on Windows, where we do install the DLLs for the native arch, in <root>/bin? While when cross compiling, that's only for native host binaries (e.g. Linux) while the cross target runtimes are in <root>/<arch>-w64-mingw32/bin.

We could also duplicate the DLLs in both the bin and lib subdirectories. That works, doesn't break existing usage patterns, but also is quite ugly.

symlinks/hardlinks would save some space.

That's true, that'd help somewhat. The Windows distributions of the toolchain don't use symlinks though (the zip distribution mechanism flattens them out), so it would be a little size increase for them, but perhaps not too bad.

And I guess that only would work for the cases when running on Windows, where we do install the DLLs for the native arch, in <root>/bin?

That's kinda the issue. For windows (the target) you want DLLs to end up there, for toolchains running on Linux the lib path are more natural. Projects are typically configured for the former.

You also might have DLLs that are tied to the compiler version, while the rest should be usable by different compilers and version.

my proposal would be to move the DLLs:

  • "internal" stuff gets into clangs internal library, ex: lib/clang/18/lib/windows/lib/libclang_rt.asan_dynamic-x86_64.dll
  • everything else to the lib directories, ex: x86_64-w64-mingw32/lib/libwinpthread-1.dll
  • Symlinks to the normal libs into bin

I suport the suggestion to make the path to the DLLs discoverable by querying the compiler (via -print-file-name and/or -print-search-dirs).

My cross-platform xPack LLVM clang distribution provides relocatable standalone binaries for Windows/macOS/Linux that can be installed in any user location, and by design multiple versions of the toolchain can be easily installed on the same machine in different versioned folders.

The challenge with this approach is to ensure that the compiled executables refer to the correct shared libraries at run-time.

Avoiding shared libraries entirely by linking them statically may work in some cases, but presents a different set of challenges on RedHat & Co, where -static is discouraged, and on macOS, where it is not supported by the linker. Partial static linking with -static-libgcc and -static-libstdc++ may be a solution for GCC, but I don't know of equivalents for compiler-rt and libc++ in the clang world. Plus that when compiling lots of executables (like clang itself, for example) having the C++ libraries in each executable is a waste of space.

Thus in my projects I query the compiler via -print-file-name or -print-search-dirs and after processing the result, I pass the library path(s) to the linker via -Wl,-rpath and -L on macOS/Linux, so each executable has a way to locate its shared libraries.

On Windows this mechanism is not available, so I set the PATH (or WINEPATH when running on WineHQ).

To conclude, there are two paths that must be handled, the path to the libraries needed by the linker, and the path to the shared libraries needed by the loader. On macOS/Linux, the two paths are the same, usually in a lib folder, possibly architecture dependent in multilib cases, and discoverable by querying the compiler with the correct options (like -m32/-m64).

For consistency reasons it would be nice to have a similar approach on Windows too.

For simplicity, in my distribution, since file symlinks on Windows are not reliable, I just copy the DLLs from bin to lib. Having the DLLs in two places wastes some space, but solves the problem.

To me, it only make sense for DLLs to be inside bin/. Putting them in lib/ just feels wrong; it also means mixing a handful of DLLs into a sea of .a import libs, as opposed to having them cleanly and visibly isolated in a separate dir.

Since the specific DLLs one needs to distribute is already tied to the specific toolchain one uses, personally I think it make sense for the packager to already know where they are and I don't mind hardcoding them at all...

I've never heard of using -print-search-dirs for this before, but from its documentation in GCC, it does not look like it is intended for the use case of finding DLLs to be distributed:

Print the name of the configured installation directory and a list of program and library directories gcc searches—and don’t do anything else.

This is useful when gcc prints the error message ‘installation problem, cannot exec cpp0: No such file or directory’. To resolve this you either need to put cpp0 and the other compiler components where gcc expects to find them, or you can set the environment variable GCC_EXEC_PREFIX to the directory where you installed them. Don’t forget the trailing ‘/’. See Environment Variables Affecting GCC.

To me this just seems to be wrongly applying Linux-specific logic to Windows. It may have been working for some mingw-w64 GCC distributions, but is it really right?

Just my two cents from a very Windows-centric viewpoint.

We could also duplicate the DLLs in both the bin and lib subdirectories.

If you ask me I would prefer not... I am already slightly bothered that the wrappers are copies of each other instead of symlinks. I know deduplicating them won't save much space, but still.

"internal" stuff gets into clangs internal library, ex: lib/clang/18/lib/windows/lib/libclang_rt.asan_dynamic-x86_64.dll

I really don't see the need for this... I thought llvm-mingw distributions are intended to be standalone -- I put each version in its own dir, so there is no chance of conflict. Putting a DLL so deep inside the tree serves no purpose other than to make the DLL exceptionally undiscoverable.

In my opinion the actual location of the DLLs is not relevant as long as the toolchain provides an introspection method that works for each supported variant (like -m32/-m64 for Intel toolchains).

And from a cross-platform perspective, it is preferable that this introspection mechanism works more or less the same on all platforms.

"internal" stuff gets into clangs internal library, ex: lib/clang/18/lib/windows/lib/libclang_rt.asan_dynamic-x86_64.dll

I really don't see the need for this... I thought llvm-mingw distributions are intended to be standalone -- I put each version in its own dir, so there is no chance of conflict. Putting a DLL so deep inside the tree serves no purpose other than to make the DLL exceptionally undiscoverable.

its the consistent way to lookup dependencies. to me llvm-mingw is primary a set of headers/libraries... usually referred as a sysroot.

the big advantage of clang is that its a native crosscompiler. means i would want to use the system's clang and i dont care where each headers and library comes from. having dlls resolved with the same infrastructure and only source of truth is the only thing reasonable amd managable imho.

running the compiled files is several steps down from that, but the toolchain should be able to link the files as well as locate the runtime dependencies. ie. locate and install the dlls, from the toolchain paths (compiler-interal or or llvm-mingw library paths) to the target path (next to executables for windows, lib directory otherwise).

using a unix-like toolchain is arguably not windows-centric. its a way of targeting windows just like anything else.

the buildsystems should handle most of that, configuring stuff to run on the target. the only issue is that this default is wrong for the sysroot and toolchain itself.