Compile shared library with statically linked libc and libstdc++
Opened this issue · 5 comments
Sorry for the long intro, this is only for context, please skip the first paragraph if you are short on time.
We would like to ship the blosc shared library with Fiji, to support reading and writing Zarr that uses blosc compressors. Thanks to https://github.com/Blosc/JBlosc, this works flawlessly with a system wide installation or an equivalent environment. It also works, for most platforms with the compiled shared libraries in a dedicated subdirectory of Fiji when no system wide installation is available. But here is the itch, the shared library of course does not statically link everything and their dog, and we are running into issues with incompatible C and C++ standard libraries on the various Linux platforms. I.e. the shared library that I compiled on various flavors of Ubuntu does not work on the always out-of-date CentOS. So, I believe we could may be get around this by shipping shared libraries that include the relevant libc and libstdc++ components? I am not sure how this will collide with the rest of the mostly Java application but it could... work.
My experience with CMake is very limited and I have so far failed to statically link these libraries into the shared libraries. The most relevant post about this may be this one: https://stackoverflow.com/questions/38694058/cmake-linking-statically-against-libgcc-and-libstdc-into-a-shared-library
but I failed following their instructions. I.e. where should I put this CMake script? I tried various places in CMakeLists.txt but I am really lacking context here and remained not successful.
Can anybody of you guys help me out here? Any hint/ advice very much appreciated!
Thanks in advance,
Stephan
I would also love to hear your opinion about what could be the best way to distribute libblosc in this kind of self-contained Java applications. I currently believe that a binary that statically includes about everything that it needs (including Snappy) may be best, but we could certainly ship a whole bunch of shared libraries. It would be great though if we could limit the number of platforms that we have to distinguish. Currently, we do Win32/64, MacOS, Linux32/64 (I am ignoring the 32bit variants right now but that can be changed). And it would be great if we wouldn't have to break this down to several Linuxes, Windowses, and MacOSes. Again, thanks for any insights!
For our projects the complexity of dealing with the blosc native library dependency has certainly come up numerous times (glencoesoftware/bioformats2raw#34 for example) and the default we currently have to say to people is: "sorry, use raw compression for now".
Based on our experience wrapping a number of native libraries in Java (jxrlib and jpeg turbo in particular) I would strongly discourage going down the route of trying to statically link glibc; there are many gotchas involved in trying to do that. The best alternative I can propose is to target library builds to the lowest common denominator. At Glencoe we do this for jxrlib, since we still want to support CentOS 6, by using Ubuntu 14.04 as a build environment and being careful about symbol leakage:
With Docker now available on almost all the CI vendors this is easier than ever to do. The above work was done long before that was a thing.
We could also go to the extremes that the Python community goes to in order to distribute manylinux
binary wheels if we so desire:
The pypa makes containers available for the various manylinux
build environments.
@axtimwalde: While I know your use case is Fiji and the distribution of binary artifacts via the Fiji update sites our use would very much benefit from shipping the relevant JAR dependencies. This is in keeping with what the SciJava community has done for jxrlib, hdf5, jpeg turbo, etc. with library loading via https://github.com/scijava/native-lib-loader.
@chris-allan thanks for the insight! Do you mean shipping binaries inside jars? I don't like this for how it requires a dedicated jar for every platform or includes every platforms binary inside a giant jar. Or am I missing something?
Also, on all Linuxes that I know of, it is super convenient to install blosc, e.g. sudo apt install libblosc1
, so for all applications that have a deb or rpm package, or ANY dependency aware package manager, there wouldn't be a real problem. Unfortunately, the Fiji app started off as this old-school 'download-and-run' thing, then realized that it ships packages, then aimed to become its own package manager... you know the history. The state it is in right now is somewhere significantly short of the established package management systems but it runs on a bunch of heterogeneous platforms and we have to find some ways to deal with it while it may get better. The least common denominator idea is certainly workable. I compiled libblosc-1.18.1 on Scientific Linux 7.3 yesterday and that seems to work on the CentOS something that we got complaints about as well as on my Ubuntu 20.04. So may be we do not have a real problem at this time. I would still be interested in solutions that could be workable in such corner cases and would like to know if statically linking the questionable libraries would (a) work, and (b) not cause other problems.
Do you mean shipping binaries inside jars? I don't like this for how it requires a dedicated jar for every platform or includes every platforms binary inside a giant jar. Or am I missing something?
Yep, that's exactly what I mean. I know it's far from ideal but it is largely the established norm across the entire Java community.
Unfortunately, the Fiji app started off as this old-school 'download-and-run' thing, then realized that it ships packages, then aimed to become its own package manager... you know the history. The state it is in right now is somewhere significantly short of the established package management systems but it runs on a bunch of heterogeneous platforms and we have to find some ways to deal with it while it may get better.
I'm certainly well aware of the history. :) It's unfortunate we have some difficulties but I think the decisions we all made trading off user convenience for developer and system adminstrator convenience were worth it.
The least common denominator idea is certainly workable. I compiled libblosc-1.18.1 on Scientific Linux 7.3 yesterday and that seems to work on the CentOS something that we got complaints about as well as on my Ubuntu 20.04.
For Linux at least, glibc backward ABI compatibility has been a guarantee for a very long time. This is primarily achieved by symbol versioning. GCC libstdc++ is a little trickier but it has been backwards compatible with no ABI breaks for ~10 years (since the 3.3 to 3.4 migration). So, if you compile on the least common denominator Linux wise you should be good. Microsoft guarantees no such backwards compatibility for C++ which is why a lot of people have several (I've got about 15) Visual C++ Redistributable runtimes installed on their systems.