muellan/metacache

Compiling with uint64_t

ohickl opened this issue · 5 comments

Hi, I am trying to compile MetaCache for use with a large reference data set.
I am running it like this:

git clone https://github.com/muellan/metacache.git
cd metacache
mamba activate compile

LD_LIBRARY_PATH=${CONDA_PREFIX}/lib:${LD_LIBRARY_PATH}
export LD_LIBRARY_PATH

CPLUS_INCLUDE_PATH=${CONDA_PREFIX}/include:${CPLUS_INCLUDE_PATH}
export CPLUS_INCLUDE_PATH

make MACROS="-DMC_TARGET_ID_TYPE=uint64_t -DMC_WINDOW_ID_TYPE=uint64_t -DMC_KMER_TYPE=uint64_t"

I get the following error:

make release_dummy DIR=build_release ARTIFACT=metacache MACROS="-DMC_TARGET_ID_TYPE=uint64_t -DMC_WINDOW_ID_TYPE=uint64_t -DMC_KMER_TYPE=uint64_t"
make[1]: Entering directory '/mnt/data/local_tools/metacache'
.../miniconda3/envs/compile/bin/x86_64-conda-linux-gnu-c++  -DMC_TARGET_ID_TYPE=uint64_t -DMC_WINDOW_ID_TYPE=uint64_t -DMC_KMER_TYPE=uint64_t -std=c++14 -Wall -Wextra -Wpedantic -I/include -O3 -c src/building.cpp -o build_release/building.o
In file included from src/options.h:31,
                 from src/candidate_structs.h:28,
                 from src/candidate_generation.h:27,
                 from src/database.h:27,
                 from src/building.h:27,
                 from src/building.cpp:24:
src/taxonomy.h: In member function 'void mc::ranked_lineages_of_targets::update(mc::target_id)':
src/taxonomy.h:980:45: error: no matching function for call to 'min(unsigned int, long unsigned int)'
  980 |         const unsigned numThreads = std::min(4U, numNewTargets / (1U << 10));
      |                                     ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from .../miniconda3/envs/compile/x86_64-conda-linux-gnu/include/c++/11.3.0/bits/char_traits.h:39,
                 from .../miniconda3/envs/compile/x86_64-conda-linux-gnu/include/c++/11.3.0/string:40,
                 from src/bitmanip.h:29,
                 from src/dna_encoding.h:27,
                 from src/hash_dna.h:27,
                 from src/config.h:34,
                 from src/candidate_structs.h:27,
                 from src/candidate_generation.h:27,
                 from src/database.h:27,
                 from src/building.h:27,
                 from src/building.cpp:24:
.../miniconda3/envs/compile/x86_64-conda-linux-gnu/include/c++/11.3.0/bits/stl_algobase.h:230:5: note: candidate: 'template<class _Tp> constexpr const _Tp& std::min(const _Tp&, const _Tp&)'
  230 |     min(const _Tp& __a, const _Tp& __b)
      |     ^~~
.../miniconda3/envs/compile/x86_64-conda-linux-gnu/include/c++/11.3.0/bits/stl_algobase.h:230:5: note:   template argument deduction/substitution failed:
In file included from src/options.h:31,
                 from src/candidate_structs.h:28,
                 from src/candidate_generation.h:27,
                 from src/database.h:27,
                 from src/building.h:27,
                 from src/building.cpp:24:
src/taxonomy.h:980:45: note:   deduced conflicting types for parameter 'const _Tp' ('unsigned int' and 'long unsigned int')
  980 |         const unsigned numThreads = std::min(4U, numNewTargets / (1U << 10));
      |                                     ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from .../miniconda3/envs/compile/x86_64-conda-linux-gnu/include/c++/11.3.0/bits/char_traits.h:39,
                 from .../miniconda3/envs/compile/x86_64-conda-linux-gnu/include/c++/11.3.0/string:40,
                 from src/bitmanip.h:29,
                 from src/dna_encoding.h:27,
                 from src/hash_dna.h:27,
                 from src/config.h:34,
                 from src/candidate_structs.h:27,
                 from src/candidate_generation.h:27,
                 from src/database.h:27,
                 from src/building.h:27,
                 from src/building.cpp:24:
.../miniconda3/envs/compile/x86_64-conda-linux-gnu/include/c++/11.3.0/bits/stl_algobase.h:278:5: note: candidate: 'template<class _Tp, class _Compare> constexpr const _Tp& std::min(const _Tp&, const _Tp&, _Compare)'
  278 |     min(const _Tp& __a, const _Tp& __b, _Compare __comp)
      |     ^~~
.../miniconda3/envs/compile/x86_64-conda-linux-gnu/include/c++/11.3.0/bits/stl_algobase.h:278:5: note:   template argument deduction/substitution failed:
In file included from src/options.h:31,
                 from src/candidate_structs.h:28,
                 from src/candidate_generation.h:27,
                 from src/database.h:27,
                 from src/building.h:27,
                 from src/building.cpp:24:
src/taxonomy.h:980:45: note:   deduced conflicting types for parameter 'const _Tp' ('unsigned int' and 'long unsigned int')
  980 |         const unsigned numThreads = std::min(4U, numNewTargets / (1U << 10));
      |                                     ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from .../miniconda3/envs/compile/x86_64-conda-linux-gnu/include/c++/11.3.0/algorithm:62,
                 from src/../dep/hpc_helpers/include/cuda_helpers.cuh:7,
                 from src/dna_encoding.h:30,
                 from src/hash_dna.h:27,
                 from src/config.h:34,
                 from src/candidate_structs.h:27,
                 from src/candidate_generation.h:27,
                 from src/database.h:27,
                 from src/building.h:27,
                 from src/building.cpp:24:
.../miniconda3/envs/compile/x86_64-conda-linux-gnu/include/c++/11.3.0/bits/stl_algo.h:3449:5: note: candidate: 'template<class _Tp> constexpr _Tp std::min(std::initializer_list<_Tp>)'
 3449 |     min(initializer_list<_Tp> __l)
      |     ^~~
.../miniconda3/envs/compile/x86_64-conda-linux-gnu/include/c++/11.3.0/bits/stl_algo.h:3449:5: note:   template argument deduction/substitution failed:
In file included from src/options.h:31,
                 from src/candidate_structs.h:28,
                 from src/candidate_generation.h:27,
                 from src/database.h:27,
                 from src/building.h:27,
                 from src/building.cpp:24:
src/taxonomy.h:980:45: note:   mismatched types 'std::initializer_list<_Tp>' and 'unsigned int'
  980 |         const unsigned numThreads = std::min(4U, numNewTargets / (1U << 10));
      |                                     ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from .../miniconda3/envs/compile/x86_64-conda-linux-gnu/include/c++/11.3.0/algorithm:62,
                 from src/../dep/hpc_helpers/include/cuda_helpers.cuh:7,
                 from src/dna_encoding.h:30,
                 from src/hash_dna.h:27,
                 from src/config.h:34,
                 from src/candidate_structs.h:27,
                 from src/candidate_generation.h:27,
                 from src/database.h:27,
                 from src/building.h:27,
                 from src/building.cpp:24:
.../miniconda3/envs/compile/x86_64-conda-linux-gnu/include/c++/11.3.0/bits/stl_algo.h:3455:5: note: candidate: 'template<class _Tp, class _Compare> constexpr _Tp std::min(std::initializer_list<_Tp>, _Compare)'
 3455 |     min(initializer_list<_Tp> __l, _Compare __comp)
      |     ^~~
.../miniconda3/envs/compile/x86_64-conda-linux-gnu/include/c++/11.3.0/bits/stl_algo.h:3455:5: note:   template argument deduction/substitution failed:
In file included from src/options.h:31,
                 from src/candidate_structs.h:28,
                 from src/candidate_generation.h:27,
                 from src/database.h:27,
                 from src/building.h:27,
                 from src/building.cpp:24:
src/taxonomy.h:980:45: note:   mismatched types 'std::initializer_list<_Tp>' and 'unsigned int'
  980 |         const unsigned numThreads = std::min(4U, numNewTargets / (1U << 10));
      |                                     ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
make[1]: *** [Makefile:215: build_release/building.o] Error 1
make[1]: Leaving directory '/mnt/data/local_tools/metacache'
make: *** [Makefile:137: release] Error 2

Just make or e.g. make MACROS="-DMC_TARGET_ID_TYPE=uint32_t -DMC_WINDOW_ID_TYPE=uint32_t" works.
Am I calling it somehow wrong? make MACROS="-DMC_KMER_TYPE=uint64_t" from the example also fails in the same manner.

Best

Oskar

Hi,
that was a bug which was luckily easy to fix.
If you update to the latest release everything should work fine.

Thanks, compilation works now!
I do still have window id type unsigned int 32 bits during a test database build attempt though,
after compiling with make MACROS="-DMC_TARGET_ID_TYPE=uint64_t -DMC_WINDOW_ID_TYPE=uint64_t -DMC_KMER_TYPE=uint64_t":

Building new database '.../databases/metacache/mc_build_test/mc_build_test' from reference sequences.
Max locations per feature set to 254
Reading taxon names ... done.
Reading taxonomic node mergers ... done.
Reading taxonomic tree ... 2564271 taxa read.
Taxonomy applied to database.
------------------------------------------------
MetaCache version    2.3.1 (20230309)
database version     20200820
------------------------------------------------
sequence type        mc::char_sequence
target id type       unsigned long int 64 bits
target limit         18446744073709551615
------------------------------------------------
window id type       unsigned int 32 bits
window limit         4294967295
window length        127
window stride        108
------------------------------------------------
sketcher type        mc::single_function_unique_min_hasher<unsigned long, mc::same_size_hash<unsigned long> >
feature type         unsigned long int 64 bits
feature hash         mc::same_size_hash<unsigned long>
kmer size            20
kmer limit           32
sketch size          16
------------------------------------------------
bucket size type     unsigned char 8 bits
max. locations       254
location limit       254
------------------------------------------------
Reading sequence to taxon mappings from .../mc_build_test/assembly_summary.txt
Reading sequence to taxon mappings from .../ncbi_taxonomy/assembly_summary_refseq.txt
Reading sequence to taxon mappings from .../ncbi_taxonomy/assembly_summary_refseq_historical.txt
Reading sequence to taxon mappings from .../ncbi_taxonomy/assembly_summary_genbank.txt
Reading sequence to taxon mappings from .../ncbi_taxonomy/assembly_summary_genbank_historical.txt
Processing reference sequences.
Added 29652 reference sequences in 415.539 s                                    %
targets              29652
ranked targets       29652
taxa in tree         2564271
------------------------------------------------
buckets              964032481
bucket size          max: 254 mean: 1.31966 +/- 3.95271 <> 42.6348
features             705592365
dead features        0
locations            931144669
------------------------------------------------
All targets are ranked.
Writing database to file ... Writing database metadata to file '.../databases/metacache/mc_build_test/mc_build_test.meta' ... done.
Writing database part to file '.../databases/metacache/mc_build_test/mc_build_test.cache0' ... done.
done.
Total build time: 625.363 s

Build command:

${p2mc}/metacache build ${p2d}/metacache/mc_build_test/mc_build_test \
                        ${p2d}/mc_build_test \
                        -taxonomy ${p2taxdb} \
                        -kmerlen 20

yeah, there was another bug that was probably introduced with the GPU version - also fixed now.
I didn't make another release for this, just git pull the latest changes from the repository.

Thanks, works!
I do have a few questions regarding database partitioning and parameters for very large reference data sets, should I open a new issue for that?

yes, a new issue would be better