mccortex unitigs: "Fatal Error: Not enough kmers in hash" or "Fatal Error: Hash table is full"
karel-brinda opened this issue · 2 comments
karel-brinda commented
- OS: OS X
- Version:
mccortex=v0.0.3-610-g400c0e3 zlib=1.2.11 htslib=1.8-17-g699ed53 ASSERTS=ON hash=Lookup3 CHECKS=ON k=3..31
Preparation:
$ wget http://ftp.ebi.ac.uk/pub/software/bigsi/nat_biotech_2018/ctx/ERR189/ERR189737/cleaned/ERR189737.ctx.bz2
$ bzip2 -d -k ERR189737.ctx.bz2
Failure mode 1:
$ mccortex31 unitigs ERR189737.ctx
[10 Sep 2020 23:21:08-DAF][cmd] mccortex31 unitigs ERR189737.ctx
[10 Sep 2020 23:21:08-DAF][cwd] /private/tmp/~20200910223252
[10 Sep 2020 23:21:08-DAF][version] mccortex=v0.0.3-610-g400c0e3 zlib=1.2.11 htslib=1.8-17-g699ed53 ASSERTS=ON hash=Lookup3 CHECKS=ON k=3..31
[10 Sep 2020 23:21:08-DAF][memory] 73 bits per kmer
[10 Sep 2020 23:21:08-DAF][cmd_mem.c:98] Fatal Error: Not enough kmers in hash: require at least 70,540,096 kmers (min memory: 624.5MB)
Karel:~20200910223252 karel$ mccortex31 unitigs ERR189737.ctx
[10 Sep 2020 23:21:18-fOD][cmd] mccortex31 unitigs ERR189737.ctx
[10 Sep 2020 23:21:18-fOD][cwd] /private/tmp/~20200910223252
[10 Sep 2020 23:21:18-fOD][version] mccortex=v0.0.3-610-g400c0e3 zlib=1.2.11 htslib=1.8-17-g699ed53 ASSERTS=ON hash=Lookup3 CHECKS=ON k=3..31
[10 Sep 2020 23:21:18-fOD][memory] 73 bits per kmer
[10 Sep 2020 23:21:18-fOD][cmd_mem.c:98] Fatal Error: Not enough kmers in hash: require at least 70,540,096 kmers (min memory: 624.5MB)
Failure mode 2:
$ bzcat -f ERR189737.ctx.bz2 | mccortex31 unitigs -
[11 Sep 2020 12:28:09-fIt][cmd] mccortex31 unitigs -
[11 Sep 2020 12:28:09-fIt][cwd] /private/tmp/~20200910223252
[11 Sep 2020 12:28:09-fIt][version] mccortex=v0.0.3-610-g400c0e3 zlib=1.2.11 htslib=1.8-17-g699ed53 ASSERTS=ON hash=Lookup3 CHECKS=ON k=3..31
[11 Sep 2020 12:28:09-fIt][memory] 73 bits per kmer
[11 Sep 2020 12:28:09-fIt][memory] graph: 496.8MB
[11 Sep 2020 12:28:09-fIt][memory] total: 496.8MB of 40GB RAM
[11 Sep 2020 12:28:09-fIt] Output in FASTA format to STDOUT
[11 Sep 2020 12:28:09-fIt][hasht] Allocating table with 56,623,104 entries, using 436MB
[11 Sep 2020 12:28:09-fIt][hasht] number of buckets: 2,097,152, bucket size: 27
[11 Sep 2020 12:28:09-fIt][graph] kmer-size: 31; colours: 1; capacity: 56,623,104
[11 Sep 2020 12:28:09-fIt][FileFilter] Reading file - [1 src colour]
[11 Sep 2020 12:28:09-fIt][GReader] 18,446,744,073,709,551,615 kmers, 16EB filesize
^[[B^[[B^[[B^[[B^[[B^[[B[11 Sep 2020 12:28:50-fIt][hasht] buckets: 2,097,152 [2^21]; bucket size: 27;
[11 Sep 2020 12:28:50-fIt][hasht] memory: 436MB; filled: 51,626,922 / 56,623,104 (91.18%)
[11 Sep 2020 12:28:50-fIt][hasht] collisions 0: 49009867
[11 Sep 2020 12:28:50-fIt][hasht] collisions 1: 1927184
[11 Sep 2020 12:28:50-fIt][hasht] collisions 2: 462390
[11 Sep 2020 12:28:50-fIt][hasht] collisions 3: 144851
[11 Sep 2020 12:28:50-fIt][hasht] collisions 4: 50724
[11 Sep 2020 12:28:50-fIt][hasht] collisions 5: 19183
[11 Sep 2020 12:28:50-fIt][hasht] collisions 6: 7551
[11 Sep 2020 12:28:50-fIt][hasht] collisions 7: 2960
[11 Sep 2020 12:28:50-fIt][hasht] collisions 8: 1266
[11 Sep 2020 12:28:50-fIt][hasht] collisions 9: 497
[11 Sep 2020 12:28:50-fIt][hasht] collisions 10: 276
[11 Sep 2020 12:28:50-fIt][hasht] collisions 11: 102
[11 Sep 2020 12:28:50-fIt][hasht] collisions 12: 38
[11 Sep 2020 12:28:50-fIt][hasht] collisions 13: 21
[11 Sep 2020 12:28:50-fIt][hasht] collisions 14: 9
[11 Sep 2020 12:28:50-fIt][hasht] collisions 15: 2
[11 Sep 2020 12:28:50-fIt][hasht] collisions 16: 1
[11 Sep 2020 12:28:50-fIt][hash_table.c:247] Fatal Error: Hash table is full
karel-brinda commented
It might be related to #89.
karel-brinda commented
Other experiments revealed that adding -m 20G
helps; I previously didn't know that this parameter should be used for the unitigs subcommand too.
Maybe changing the error message Fatal Error: Hash table is full
to something more informative would help?