Pufferfish doesn't build within a conda environment (which has all the prerequisites) due to failed build of tslib from SeqLib
hermidalc opened this issue · 9 comments
During make
of pufferfish I'm getting the following error. You can see just above the error it does the presence and sanity checks for zlib.h
and all good, so something wrong with an include path or something else that I can’t see during the build?
It looks like some of the pufferfish dependencies cannot build within a conda environment that has the requisite dependencies such as cmake, cxx-compiler, zlib, bzip2. Trying to figure out why because being able to build within a conda environment is important for developers using pufferfish as a submodule in project that they want to make redistributable, where mamba/conda is a very popular framework for software dependency management.
make
...
[ 10%] Creating directories for 'libseqlib'
[ 11%] Performing download step (git clone) for 'libseqlib'
Cloning into 'seqlib'...
Already on 'master'
Your branch is up to date with 'origin/master'.
Submodule 'htslib' (https://github.com/samtools/htslib.git) registered for path 'htslib'
Cloning into '/home/hermidalc/projects/github/hermidalc/functional-profiler/pufferfish/external/seqlib/htslib'...
Submodule path 'htslib': checked out 'be22a2a1082f6e570718439b9ace2db17a609eae'
[ 12%] No patch step for 'libseqlib'
[ 13%] Performing configure step for 'libseqlib'
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
/home/hermidalc/projects/github/hermidalc/functional-profiler/pufferfish/external/seqlib/missing: Unknown `--is-lightweight' option
Try `/home/hermidalc/projects/github/hermidalc/functional-profiler/pufferfish/external/seqlib/missing --help' for more information
configure: WARNING: 'missing' script is too old or missing
checking for a thread-safe mkdir -p... /usr/bin/mkdir -p
checking for gawk... gawk
checking whether make sets $(MAKE)... yes
checking whether make supports nested variables... yes
checking whether to enable maintainer-specific portions of Makefiles... no
checking whether the C++ compiler works... yes
checking for C++ compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C++ compiler... yes
checking whether /home/hermidalc/soft/mambaforge/envs/functional-profiler/bin/x86_64-conda-linux-gnu-c++ accepts -g... yes
checking for style of include used by make... GNU
checking dependency style of /home/hermidalc/soft/mambaforge/envs/functional-profiler/bin/x86_64-conda-linux-gnu-c++... gcc3
checking for x86_64-conda-linux-gnu-gcc... /home/hermidalc/soft/mambaforge/envs/functional-profiler/bin/x86_64-conda-linux-gnu-cc
checking whether we are using the GNU C compiler... yes
checking whether /home/hermidalc/soft/mambaforge/envs/functional-profiler/bin/x86_64-conda-linux-gnu-cc accepts -g... yes
checking for /home/hermidalc/soft/mambaforge/envs/functional-profiler/bin/x86_64-conda-linux-gnu-cc option to accept ISO C89... none needed
checking whether /home/hermidalc/soft/mambaforge/envs/functional-profiler/bin/x86_64-conda-linux-gnu-cc understands -c and -o together... yes
checking dependency style of /home/hermidalc/soft/mambaforge/envs/functional-profiler/bin/x86_64-conda-linux-gnu-cc... gcc3
checking for x86_64-conda-linux-gnu-ranlib... /home/hermidalc/soft/mambaforge/envs/functional-profiler/bin/x86_64-conda-linux-gnu-ranlib
checking how to run the C++ preprocessor... /home/hermidalc/soft/mambaforge/envs/functional-profiler/bin/x86_64-conda-linux-gnu-c++ -E
checking for grep that handles long lines and -e... /usr/bin/grep
checking for egrep... /usr/bin/grep -E
checking for ANSI C header files... yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking zlib.h usability... yes
checking zlib.h presence... yes
checking for zlib.h... yes
checking for library containing gzopen... -lz
checking for library containing lzma_code... -llzma
checking for library containing BZ2_bzBuffToBuffDecompress... -lbz2
checking for library containing clock_gettime... -lrt
checking that generated files are newer than configure... done
configure: creating ./config.status
config.status: creating Makefile
config.status: creating src/Makefile
config.status: creating src/seqtools/Makefile
config.status: creating config.h
config.status: executing depfiles commands
[ 14%] Performing build step for 'libseqlib'
Making all in htslib
In file included from bgzf.c:43:
htslib/bgzf.h:35:10: fatal error: zlib.h: No such file or directory
35 | #include <zlib.h>
| ^~~~~~~~
compilation terminated.
make[5]: *** [Makefile:121: bgzf.o] Error 1
make[4]: *** [Makefile:358: all-recursive] Error 1
make[3]: *** [Makefile:299: all] Error 2
make[2]: *** [CMakeFiles/libseqlib.dir/build.make:85: libseqlib-prefix/src/libseqlib-stamp/libseqlib-build] Error 2
make[1]: *** [CMakeFiles/Makefile2:235: CMakeFiles/libseqlib.dir/all] Error 2
make: *** [Makefile:156: all] Error 2
I've see that pufferfish is trying to install the dep SeqLib, which itself is trying to install the dep htslib, and shows in its readme that it requires zlib
.
But in the conda environment that I'm building pufferfish I already have both zlib
and libzlib
packages installed, and I can see zlib.h
in the conda environment include
directory path.
I deactivated the conda environment and tried to make
using my Fedora 36 system libraries, making sure I had rpms for cmake
, zlib-devel
, and bzip2-devel
. It builds successfully, but still the wider problem remains because pufferfish
should be able to build within a conda environment that has these same dependencies.
I updated the issue title and OP a bit to highlight the conda issue.
With my conda environment active I see the following CFLAGS
, CPPFLAGS
, and CXXFLAGS
:
$ echo $CFLAGS
-march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/hermidalc/soft/mambaforge/envs/functional-profiler/include
$echo $CPPFLAGS
-DNDEBUG -D_FORTIFY_SOURCE=2 -O2 -isystem /home/hermidalc/soft/mambaforge/envs/functional-profiler/include
$ echo $CXXFLAGS
-fvisibility-inlines-hidden -std=c++17 -fmessage-length=0 -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/hermidalc/soft/mambaforge/envs/functional-profiler/include
Which are pointing to the right place and -isystem /home/hermidalc/soft/mambaforge/envs/functional-profiler/include
points to the header files like zlib.h
. So really not sure what is up here since this should work with a conda environment, especially if using my Fedora 36 system-wide libraries it is able to build.
If the pufferfish
developers want to help and reproduce this here's how you do it. It's really easy to install and configure mamba
(C++ drop-in replacement for conda
which includes conda
):
- Install Mambaforge for your architecture
- Do initial setup:
$ mamba init
$ mamba config --set auto_activate_base false
$ mamba config --set channel_priority strict
- Close your terminal and open a new one so it sources the updated
.bashrc
- Update
mamba
base:
$ mamba update --all
- Create new environment and activate it:
$ mamba create -n pufferfish boost-cpp bzip2 cmake curl icu gcc gxx libjemalloc libblas=*[build=*mkl] libhwloc libopenblas jsoncpp make tbb wget xz zlib
$ mamba activate pufferfish
- You will see the following full environment:
(pufferfish) $ mamba list
# packages in environment at /home/hermidalc/soft/mambaforge/envs/pufferfish:
#
# Name Version Build Channel
_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 2_kmp_llvm conda-forge
binutils_impl_linux-64 2.36.1 h193b22a_2 conda-forge
boost-cpp 1.80.0 h75c5d50_0 conda-forge
bzip2 1.0.8 h7f98852_4 conda-forge
c-ares 1.18.1 h7f98852_0 conda-forge
ca-certificates 2022.9.24 ha878542_0 conda-forge
cmake 3.24.2 h5432695_0 conda-forge
curl 7.85.0 h2283fc2_0 conda-forge
expat 2.4.9 h27087fc_0 conda-forge
gcc 12.1.0 h9ea6d83_10 conda-forge
gcc_impl_linux-64 12.1.0 hea43390_16 conda-forge
gettext 0.21.1 h27087fc_0 conda-forge
gxx 12.1.0 h9ea6d83_10 conda-forge
gxx_impl_linux-64 12.1.0 hea43390_16 conda-forge
icu 70.1 h27087fc_0 conda-forge
jsoncpp 1.9.5 h4bd325d_1 conda-forge
kernel-headers_linux-64 2.6.32 he073ed8_15 conda-forge
keyutils 1.6.1 h166bdaf_0 conda-forge
krb5 1.19.3 h08a2579_0 conda-forge
ld_impl_linux-64 2.36.1 hea4e1c9_2 conda-forge
libblas 3.9.0 16_linux64_mkl conda-forge
libcurl 7.85.0 h2283fc2_0 conda-forge
libedit 3.1.20191231 he28a2e2_2 conda-forge
libev 4.33 h516909a_1 conda-forge
libgcc-devel_linux-64 12.1.0 h1ec3361_16 conda-forge
libgcc-ng 12.1.0 h8d9b700_16 conda-forge
libgfortran-ng 12.1.0 h69a702a_16 conda-forge
libgfortran5 12.1.0 hdcd56e2_16 conda-forge
libgomp 12.1.0 h8d9b700_16 conda-forge
libhwloc 2.8.0 h32351e8_1 conda-forge
libiconv 1.17 h166bdaf_0 conda-forge
libidn2 2.3.3 h166bdaf_0 conda-forge
libjemalloc 5.2.1 h9c3ff4c_6 conda-forge
libnghttp2 1.47.0 hff17c54_1 conda-forge
libopenblas 0.3.21 pthreads_h78a6416_3 conda-forge
libsanitizer 12.1.0 ha89aaad_16 conda-forge
libssh2 1.10.0 hf14f497_3 conda-forge
libstdcxx-devel_linux-64 12.1.0 h1ec3361_16 conda-forge
libstdcxx-ng 12.1.0 ha89aaad_16 conda-forge
libunistring 0.9.10 h7f98852_0 conda-forge
libuv 1.44.2 h166bdaf_0 conda-forge
libxml2 2.10.2 h7463322_2 conda-forge
libzlib 1.2.12 h166bdaf_4 conda-forge
llvm-openmp 14.0.4 he0ac6c6_0 conda-forge
make 4.3 hd18ef5c_1 conda-forge
mkl 2022.1.0 h84fe81f_915 conda-forge
ncurses 6.3 h27087fc_1 conda-forge
openssl 3.0.5 h166bdaf_2 conda-forge
rhash 1.4.3 h166bdaf_0 conda-forge
sysroot_linux-64 2.12 he073ed8_15 conda-forge
tbb 2021.6.0 h924138e_0 conda-forge
wget 1.20.3 ha35d2d1_1 conda-forge
xz 5.2.6 h166bdaf_0 conda-forge
zlib 1.2.12 h166bdaf_4 conda-forge
zstd 1.5.2 h6239696_4 conda-forge
- With the environment activated in your shell, download and try to build
pufferfish
to reproduce this error.
You say in your README How to Use that you also have the SeqLib dependencies lzma
, bz2
, and z
(I'm assuming here zlib
) packaged with pufferfish, so I'm wondering why this doesn't work as well to link the zlib.h
header file during build
I see in your CMakeLists.txt
top-level file the following below:
ExternalProject_Add(libseqlib
GIT_REPOSITORY https://github.com/COMBINE-lab/SeqLib.git
GIT_TAG master
UPDATE_COMMAND ""
UPDATE_DISCONNECTED 1
BUILD_IN_SOURCE TRUE
DOWNLOAD_DIR ${CMAKE_CURRENT_SOURCE_DIR}/external/seqlib
SOURCE_DIR ${CMAKE_CURRENT_SOURCE_DIR}/external/seqlib
INSTALL_DIR ${CMAKE_CURRENT_SOURCE_DIR}/external/install
CONFIGURE_COMMAND ./configure
BUILD_COMMAND make CXXFLAGS='-std=c++14' ${LIBSTADEN_LDFLAGS} ${LIBSTADEN_CFLAGS}
INSTALL_COMMAND mkdir -p <INSTALL_DIR>/lib && mkdir -p <INSTALL_DIR>/include && cp src/libseqlib.a <INSTALL_DIR>/lib &&
cp htslib/libhts.a <INSTALL_DIR>/lib &&
cp -r SeqLib <INSTALL_DIR>/include &&
cp -r json <INSTALL_DIR>/include &&
cp -r htslib <INSTALL_DIR>/include
)
Specifically:
BUILD_COMMAND make CXXFLAGS='-std=c++14' ${LIBSTADEN_LDFLAGS} ${LIBSTADEN_CFLAGS}
LIBSTADEN_LDFLAGS
and LIBSTADEN_CFLAGS
if you look further up into the file you see that are including paths from the external/install/lib
and external/install/include
but while I see you search and install if needed lzma
(xz
) and bz2
, I'm not seeing anywhere that you are trying to download and install zlib
. So for that reason the CXXFLAGS
passed to the SeqLib build fail at trying to find zlib.h
.
I see the culprit here is htslib
trying to build for SeqLib, and for some reason htslib
won't compile in a conda environment because it says it cannot locate zlib.h
even though the conda zlib
include files are in the include path and I can see them in the zlib
Makefile
after configure
, but for some reason it will compile outside of a conda environment using my Fedora 36 system-wide dnf installed libraries.
I figured out the source of the problem, but still can't get it to build properly.
If I manually clone SeqLib and the go into the htslib
submodule inside it and build it using their instructions by running autoreconf -i
before running ./configure
then htslib
builds correctly. Though not sure how to get SeqLib submodule inside pufferfish to do this properly before configure.
I tried to preface the SeqLib ./configure
, in the pufferfish CMakeList.txt
file, with cd htslib && autoreconf -i && cd ../ && ./configure
but that still doesn't work, probablt because there are missing important *FLAGS
environment variables not passed to this. This is more pain than it's worth, though I believe it's important to have software work in conda environment, because people do not always have sudo
rights to install missing system-wide packages on certain computers like HPC clusters and it makes software you write much easier to redistribute.