COMBINE-lab/pufferfish

Pufferfish doesn't build within a conda environment (which has all the prerequisites) due to failed build of tslib from SeqLib

hermidalc opened this issue · 9 comments

During make of pufferfish I'm getting the following error. You can see just above the error it does the presence and sanity checks for zlib.h and all good, so something wrong with an include path or something else that I can’t see during the build?

It looks like some of the pufferfish dependencies cannot build within a conda environment that has the requisite dependencies such as cmake, cxx-compiler, zlib, bzip2. Trying to figure out why because being able to build within a conda environment is important for developers using pufferfish as a submodule in project that they want to make redistributable, where mamba/conda is a very popular framework for software dependency management.

make
...
[ 10%] Creating directories for 'libseqlib'
[ 11%] Performing download step (git clone) for 'libseqlib'
Cloning into 'seqlib'...
Already on 'master'
Your branch is up to date with 'origin/master'.
Submodule 'htslib' (https://github.com/samtools/htslib.git) registered for path 'htslib'
Cloning into '/home/hermidalc/projects/github/hermidalc/functional-profiler/pufferfish/external/seqlib/htslib'...
Submodule path 'htslib': checked out 'be22a2a1082f6e570718439b9ace2db17a609eae'
[ 12%] No patch step for 'libseqlib'
[ 13%] Performing configure step for 'libseqlib'
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
/home/hermidalc/projects/github/hermidalc/functional-profiler/pufferfish/external/seqlib/missing: Unknown `--is-lightweight' option
Try `/home/hermidalc/projects/github/hermidalc/functional-profiler/pufferfish/external/seqlib/missing --help' for more information
configure: WARNING: 'missing' script is too old or missing
checking for a thread-safe mkdir -p... /usr/bin/mkdir -p
checking for gawk... gawk
checking whether make sets $(MAKE)... yes
checking whether make supports nested variables... yes
checking whether to enable maintainer-specific portions of Makefiles... no
checking whether the C++ compiler works... yes
checking for C++ compiler default output file name... a.out
checking for suffix of executables... 
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C++ compiler... yes
checking whether /home/hermidalc/soft/mambaforge/envs/functional-profiler/bin/x86_64-conda-linux-gnu-c++ accepts -g... yes
checking for style of include used by make... GNU
checking dependency style of /home/hermidalc/soft/mambaforge/envs/functional-profiler/bin/x86_64-conda-linux-gnu-c++... gcc3
checking for x86_64-conda-linux-gnu-gcc... /home/hermidalc/soft/mambaforge/envs/functional-profiler/bin/x86_64-conda-linux-gnu-cc
checking whether we are using the GNU C compiler... yes
checking whether /home/hermidalc/soft/mambaforge/envs/functional-profiler/bin/x86_64-conda-linux-gnu-cc accepts -g... yes
checking for /home/hermidalc/soft/mambaforge/envs/functional-profiler/bin/x86_64-conda-linux-gnu-cc option to accept ISO C89... none needed
checking whether /home/hermidalc/soft/mambaforge/envs/functional-profiler/bin/x86_64-conda-linux-gnu-cc understands -c and -o together... yes
checking dependency style of /home/hermidalc/soft/mambaforge/envs/functional-profiler/bin/x86_64-conda-linux-gnu-cc... gcc3
checking for x86_64-conda-linux-gnu-ranlib... /home/hermidalc/soft/mambaforge/envs/functional-profiler/bin/x86_64-conda-linux-gnu-ranlib
checking how to run the C++ preprocessor... /home/hermidalc/soft/mambaforge/envs/functional-profiler/bin/x86_64-conda-linux-gnu-c++ -E
checking for grep that handles long lines and -e... /usr/bin/grep
checking for egrep... /usr/bin/grep -E
checking for ANSI C header files... yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking zlib.h usability... yes
checking zlib.h presence... yes
checking for zlib.h... yes
checking for library containing gzopen... -lz
checking for library containing lzma_code... -llzma
checking for library containing BZ2_bzBuffToBuffDecompress... -lbz2
checking for library containing clock_gettime... -lrt
checking that generated files are newer than configure... done
configure: creating ./config.status
config.status: creating Makefile
config.status: creating src/Makefile
config.status: creating src/seqtools/Makefile
config.status: creating config.h
config.status: executing depfiles commands
[ 14%] Performing build step for 'libseqlib'
Making all in htslib
In file included from bgzf.c:43:
htslib/bgzf.h:35:10: fatal error: zlib.h: No such file or directory
   35 | #include <zlib.h>
      |          ^~~~~~~~
compilation terminated.
make[5]: *** [Makefile:121: bgzf.o] Error 1
make[4]: *** [Makefile:358: all-recursive] Error 1
make[3]: *** [Makefile:299: all] Error 2
make[2]: *** [CMakeFiles/libseqlib.dir/build.make:85: libseqlib-prefix/src/libseqlib-stamp/libseqlib-build] Error 2
make[1]: *** [CMakeFiles/Makefile2:235: CMakeFiles/libseqlib.dir/all] Error 2
make: *** [Makefile:156: all] Error 2

I've see that pufferfish is trying to install the dep SeqLib, which itself is trying to install the dep htslib, and shows in its readme that it requires zlib.

But in the conda environment that I'm building pufferfish I already have both zlib and libzlib packages installed, and I can see zlib.h in the conda environment include directory path.

I deactivated the conda environment and tried to make using my Fedora 36 system libraries, making sure I had rpms for cmake, zlib-devel, and bzip2-devel. It builds successfully, but still the wider problem remains because pufferfish should be able to build within a conda environment that has these same dependencies.

I updated the issue title and OP a bit to highlight the conda issue.

With my conda environment active I see the following CFLAGS, CPPFLAGS, and CXXFLAGS:

$ echo $CFLAGS
-march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/hermidalc/soft/mambaforge/envs/functional-profiler/include

$echo $CPPFLAGS
-DNDEBUG -D_FORTIFY_SOURCE=2 -O2 -isystem /home/hermidalc/soft/mambaforge/envs/functional-profiler/include

$ echo $CXXFLAGS
-fvisibility-inlines-hidden -std=c++17 -fmessage-length=0 -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/hermidalc/soft/mambaforge/envs/functional-profiler/include

Which are pointing to the right place and -isystem /home/hermidalc/soft/mambaforge/envs/functional-profiler/include points to the header files like zlib.h. So really not sure what is up here since this should work with a conda environment, especially if using my Fedora 36 system-wide libraries it is able to build.

If the pufferfish developers want to help and reproduce this here's how you do it. It's really easy to install and configure mamba (C++ drop-in replacement for conda which includes conda):

  1. Install Mambaforge for your architecture
  2. Do initial setup:
$ mamba init
$ mamba config --set auto_activate_base false
$ mamba config --set channel_priority strict
  1. Close your terminal and open a new one so it sources the updated .bashrc
  2. Update mamba base:
$ mamba update --all
  1. Create new environment and activate it:
$ mamba create -n pufferfish boost-cpp bzip2 cmake curl icu gcc gxx libjemalloc libblas=*[build=*mkl] libhwloc libopenblas jsoncpp make tbb wget xz zlib
$ mamba activate pufferfish
  1. You will see the following full environment:
(pufferfish) $ mamba list
# packages in environment at /home/hermidalc/soft/mambaforge/envs/pufferfish:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                  2_kmp_llvm    conda-forge
binutils_impl_linux-64    2.36.1               h193b22a_2    conda-forge
boost-cpp                 1.80.0               h75c5d50_0    conda-forge
bzip2                     1.0.8                h7f98852_4    conda-forge
c-ares                    1.18.1               h7f98852_0    conda-forge
ca-certificates           2022.9.24            ha878542_0    conda-forge
cmake                     3.24.2               h5432695_0    conda-forge
curl                      7.85.0               h2283fc2_0    conda-forge
expat                     2.4.9                h27087fc_0    conda-forge
gcc                       12.1.0              h9ea6d83_10    conda-forge
gcc_impl_linux-64         12.1.0              hea43390_16    conda-forge
gettext                   0.21.1               h27087fc_0    conda-forge
gxx                       12.1.0              h9ea6d83_10    conda-forge
gxx_impl_linux-64         12.1.0              hea43390_16    conda-forge
icu                       70.1                 h27087fc_0    conda-forge
jsoncpp                   1.9.5                h4bd325d_1    conda-forge
kernel-headers_linux-64   2.6.32              he073ed8_15    conda-forge
keyutils                  1.6.1                h166bdaf_0    conda-forge
krb5                      1.19.3               h08a2579_0    conda-forge
ld_impl_linux-64          2.36.1               hea4e1c9_2    conda-forge
libblas                   3.9.0            16_linux64_mkl    conda-forge
libcurl                   7.85.0               h2283fc2_0    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libev                     4.33                 h516909a_1    conda-forge
libgcc-devel_linux-64     12.1.0              h1ec3361_16    conda-forge
libgcc-ng                 12.1.0              h8d9b700_16    conda-forge
libgfortran-ng            12.1.0              h69a702a_16    conda-forge
libgfortran5              12.1.0              hdcd56e2_16    conda-forge
libgomp                   12.1.0              h8d9b700_16    conda-forge
libhwloc                  2.8.0                h32351e8_1    conda-forge
libiconv                  1.17                 h166bdaf_0    conda-forge
libidn2                   2.3.3                h166bdaf_0    conda-forge
libjemalloc               5.2.1                h9c3ff4c_6    conda-forge
libnghttp2                1.47.0               hff17c54_1    conda-forge
libopenblas               0.3.21          pthreads_h78a6416_3    conda-forge
libsanitizer              12.1.0              ha89aaad_16    conda-forge
libssh2                   1.10.0               hf14f497_3    conda-forge
libstdcxx-devel_linux-64  12.1.0              h1ec3361_16    conda-forge
libstdcxx-ng              12.1.0              ha89aaad_16    conda-forge
libunistring              0.9.10               h7f98852_0    conda-forge
libuv                     1.44.2               h166bdaf_0    conda-forge
libxml2                   2.10.2               h7463322_2    conda-forge
libzlib                   1.2.12               h166bdaf_4    conda-forge
llvm-openmp               14.0.4               he0ac6c6_0    conda-forge
make                      4.3                  hd18ef5c_1    conda-forge
mkl                       2022.1.0           h84fe81f_915    conda-forge
ncurses                   6.3                  h27087fc_1    conda-forge
openssl                   3.0.5                h166bdaf_2    conda-forge
rhash                     1.4.3                h166bdaf_0    conda-forge
sysroot_linux-64          2.12                he073ed8_15    conda-forge
tbb                       2021.6.0             h924138e_0    conda-forge
wget                      1.20.3               ha35d2d1_1    conda-forge
xz                        5.2.6                h166bdaf_0    conda-forge
zlib                      1.2.12               h166bdaf_4    conda-forge
zstd                      1.5.2                h6239696_4    conda-forge
  1. With the environment activated in your shell, download and try to build pufferfish to reproduce this error.

You say in your README How to Use that you also have the SeqLib dependencies lzma, bz2, and z (I'm assuming here zlib) packaged with pufferfish, so I'm wondering why this doesn't work as well to link the zlib.h header file during build

I see in your CMakeLists.txt top-level file the following below:

ExternalProject_Add(libseqlib
GIT_REPOSITORY https://github.com/COMBINE-lab/SeqLib.git
GIT_TAG        master
UPDATE_COMMAND ""
UPDATE_DISCONNECTED 1
BUILD_IN_SOURCE TRUE
DOWNLOAD_DIR ${CMAKE_CURRENT_SOURCE_DIR}/external/seqlib
SOURCE_DIR ${CMAKE_CURRENT_SOURCE_DIR}/external/seqlib
INSTALL_DIR ${CMAKE_CURRENT_SOURCE_DIR}/external/install
CONFIGURE_COMMAND ./configure 
BUILD_COMMAND     make CXXFLAGS='-std=c++14' ${LIBSTADEN_LDFLAGS} ${LIBSTADEN_CFLAGS}
INSTALL_COMMAND   mkdir -p <INSTALL_DIR>/lib && mkdir -p <INSTALL_DIR>/include && cp src/libseqlib.a <INSTALL_DIR>/lib && 
                  cp htslib/libhts.a <INSTALL_DIR>/lib &&
                  cp -r SeqLib <INSTALL_DIR>/include &&
                  cp -r json <INSTALL_DIR>/include &&
                  cp -r htslib <INSTALL_DIR>/include
)

Specifically:

BUILD_COMMAND     make CXXFLAGS='-std=c++14' ${LIBSTADEN_LDFLAGS} ${LIBSTADEN_CFLAGS}

LIBSTADEN_LDFLAGS and LIBSTADEN_CFLAGS if you look further up into the file you see that are including paths from the external/install/lib and external/install/include but while I see you search and install if needed lzma (xz) and bz2, I'm not seeing anywhere that you are trying to download and install zlib. So for that reason the CXXFLAGS passed to the SeqLib build fail at trying to find zlib.h.

I see the culprit here is htslib trying to build for SeqLib, and for some reason htslib won't compile in a conda environment because it says it cannot locate zlib.h even though the conda zlib include files are in the include path and I can see them in the zlib Makefile after configure, but for some reason it will compile outside of a conda environment using my Fedora 36 system-wide dnf installed libraries.

I figured out the source of the problem, but still can't get it to build properly.

If I manually clone SeqLib and the go into the htslib submodule inside it and build it using their instructions by running autoreconf -i before running ./configure then htslib builds correctly. Though not sure how to get SeqLib submodule inside pufferfish to do this properly before configure.

I tried to preface the SeqLib ./configure, in the pufferfish CMakeList.txt file, with cd htslib && autoreconf -i && cd ../ && ./configure but that still doesn't work, probablt because there are missing important *FLAGS environment variables not passed to this. This is more pain than it's worth, though I believe it's important to have software work in conda environment, because people do not always have sudo rights to install missing system-wide packages on certain computers like HPC clusters and it makes software you write much easier to redistribute.

For others coming here, I see now there is some useful info on how to compile in this issue #27. Will have to try it on my linux box to see if it can work.