terminate called recursively -- Aborted (core dumped)
0xaf1f opened this issue · 8 comments
I'm trying to infer a tree using concatenated SNPs and consistently get this core dump when multithreading. I do not have this problem with the original raxmlHPC
, but it does not complete within the 2 day maximum walltime of the cluster and doesn't do checkpointing either (as far as i know), so I can't restart it from where it left off.
So I'm trying raxml-ng (latest version 1.1.0):
RAxML-NG v. 1.1.0 released on 29.11.2021 by The Exelixis Lab.
Developed by: Alexey M. Kozlov and Alexandros Stamatakis.
Contributors: Diego Darriba, Tomas Flouri, Benoit Morel, Sarah Lutteropp, Ben Bettisworth.
Latest version: https://github.com/amkozlov/raxml-ng
Questions/problems/suggestions? Please visit: https://groups.google.com/forum/#!forum/raxml
System: Intel(R) Xeon(R) Platinum 8260M CPU @ 2.40GHz, 96 cores, 4031 GB RAM
RAxML-NG was called at 09-Dec-2022 20:16:42 as follows:
raxml-ng --all --msa SNP.parse.raxml.rba --model GTR+G+ASC_FELS{4087123} --prefix SNP --threads 24 --bs-trees 100
Analysis options:
run mode: ML tree search + bootstrapping (Felsenstein Bootstrap)
start tree(s): random (10) + parsimony (10)
bootstrap replicates: 100
random seed: 1670635002
tip-inner: OFF
pattern compression: ON
per-rate scalers: OFF
site repeats: ON
branch lengths: proportional (ML estimate, algorithm: NR-FAST)
SIMD kernels: AVX2
parallelization: coarse-grained (auto), PTHREADS (24 threads), thread pinning: OFF
WARNING: The model you specified on the command line (GTR+G+ASC_FELS{4087123}) will be ignored
since the binary MSA file already contains a model definition.
If you want to change the model, please re-run RAxML-NG
with the original PHYLIP/FASTA alignment and --redo option.
[00:00:00] Loading binary alignment from file: SNP.parse.raxml.rba
[00:00:00] Alignment comprises 4832 taxa, 1 partitions and 82354 patterns
Partition 0: noname
Model: GTR+FO+G4m+ASC_FELS{4087123}
Alignment sites / patterns: 324409 / 82354
Gaps: 0.00 %
Invariant sites: 0.00 %
Parallelization scheme autoconfig: 3 worker(s) x 8 thread(s)
NOTE: Per-rate scalers were automatically enabled to prevent numerical issues on taxa-rich alignments.
NOTE: You can use --force switch to skip this check and fall back to per-site scalers.
Parallel reduction/worker buffer size: 1 KB / 0 KB
[00:00:00] Generating 10 random starting tree(s) with 4832 taxa
[00:00:01] Generating 10 parsimony starting tree(s) with 4832 taxa
[02:02:53] Data distribution: max. partitions/sites/weight per thread: 1 / 10295 / 164720
[02:02:55] Data distribution: max. searches per worker: 41
Starting ML tree search with 20 distinct starting trees
terminate called recursively
terminate called recursively
/var/spool/slurm/d/job13600690/slurm_script: line 19: 75715 Aborted (core dumped) raxml-ng --all --msa SNP.parse.raxml.rba --model GTR+G+ASC_FELS{$invar_sites} --prefix SNP --threads 24 --bs-trees 100
I have also tried with --threads 1 --workers 1
and it timed out at the 2 day maximum, but the log doesn't indicate that it got anywhere past
Starting ML tree search with 20 distinct starting trees
so even if it worked with 1 thread, it looks like it'd be painstakingly slow.
Could you please try to re-run with --workers 1 --log debug
and also check the memory usage?
Ok, it ran for a while, then there was an unrelated node failure (a few of my other unrelated jobs on other nodes also failed at the same time, so I think there was a cluster issue at the time). I restarted the run and about a day later raxml-ng fails again with the same error. Here is the second log:
RAxML-NG v. 1.1.0 released on 29.11.2021 by The Exelixis Lab.
Developed by: Alexey M. Kozlov and Alexandros Stamatakis.
Contributors: Diego Darriba, Tomas Flouri, Benoit Morel, Sarah Lutteropp, Ben Bettisworth.
Latest version: https://github.com/amkozlov/raxml-ng
Questions/problems/suggestions? Please visit: https://groups.google.com/forum/#!forum/raxml
System: AMD EPYC 7742 64-Core Processor, 128 cores, 251 GB RAM
RAxML-NG was called at 16-Dec-2022 13:42:14 as follows:
raxml-ng --all --msa SNP.parse.raxml.rba --model GTR+G+ASC_FELS{4087123} --prefix SNP --threads 24 --workers 1 --bs-trees 100 --log debug
Analysis options:
run mode: ML tree search + bootstrapping (Felsenstein Bootstrap)
start tree(s): random (10) + parsimony (10)
bootstrap replicates: 100
random seed: 1671216134
tip-inner: OFF
pattern compression: ON
per-rate scalers: OFF
site repeats: ON
branch lengths: proportional (ML estimate, algorithm: NR-FAST)
SIMD kernels: AVX2
parallelization: PTHREADS (24 threads), thread pinning: OFF
RBA partial loading: OFF
WARNING: The model you specified on the command line (GTR+G+ASC_FELS{4087123}) will be ignored
since the binary MSA file already contains a model definition.
If you want to change the model, please re-run RAxML-NG
with the original PHYLIP/FASTA alignment and --redo option.
[00:00:00] Loading binary alignment from file: SNP.parse.raxml.rba
[00:00:02] Alignment comprises 4832 taxa, 1 partitions and 82354 patterns
Partition 0: noname
Model: GTR+FO+G4m+ASC_FELS{4087123}
Alignment sites / patterns: 324409 / 82354
Gaps: 0.00 %
Invariant sites: 0.00 %
|noname| |GTR+FO+G4m+ASC_FELS{4087123}| |1-324409|
NOTE: Per-rate scalers were automatically enabled to prevent numerical issues on taxa-rich alignments.
NOTE: You can use --force switch to skip this check and fall back to per-site scalers.
Generating a random starting tree with 4832 taxa
Parallel reduction/worker buffer size: 1 KB / 0 KB
[00:00:02] NOTE: Resuming execution from checkpoint (logLH: -10351404.46, ML trees: 0, bootstraps: 0)
Initial model parameters:
Partition: noname
Rate heterogeneity: GAMMA (4 cats, mean), alpha: 1.000000 (ML), weights&rates: (0.250000,0.136954) (0.250000,0.476752) (0.250000,1.000000) (0.250000,2.386294)
Base frequencies (ML): 0.250000 0.250000 0.250000 0.250000
Substitution rates (ML): 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
[00:00:02] Data distribution: max. partitions/sites/weight per thread: 1 / 3432 / 54912
thread# part# start length weight
0 0 78936 3418 54688
1 0 75504 3432 54912
2 0 72072 3432 54912
3 0 68640 3432 54912
4 0 65208 3432 54912
5 0 61776 3432 54912
6 0 58344 3432 54912
7 0 54912 3432 54912
8 0 51480 3432 54912
9 0 48048 3432 54912
10 0 44616 3432 54912
11 0 41184 3432 54912
12 0 37752 3432 54912
13 0 34320 3432 54912
14 0 30888 3432 54912
15 0 27456 3432 54912
16 0 24024 3432 54912
17 0 20592 3432 54912
18 0 17160 3432 54912
19 0 13728 3432 54912
20 0 10296 3432 54912
21 0 6864 3432 54912
22 0 3432 3432 54912
23 0 0 3432 54912
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
[00:00:03] Data distribution: max. searches per worker: 120
BARRIER time: 0.003971 ms
Starting ML tree search with 20 distinct starting trees
[00:00:06 -10351404.457014] SPR radius for FAST iterations: 25 (autodetect)
[00:00:06 -10351404.457014] FAST spr round 6 (radius: 25)
- after brlen: logLH = -10351387.624913
[00:20:24 -10351387.624913] FAST spr round 7 (radius: 25)
- after brlen: logLH = -10351373.124242
[00:38:26 -10351373.124242] FAST spr round 8 (radius: 25)
- after brlen: logLH = -10351362.911341
[00:55:22 -10351362.911341] FAST spr round 9 (radius: 25)
- after brlen: logLH = -10351356.834441
[01:11:42 -10351356.834441] FAST spr round 10 (radius: 25)
- after brlen: logLH = -10351351.867060
[01:27:42 -10351351.867060] FAST spr round 11 (radius: 25)
- after brlen: logLH = -10351347.864267
[01:43:43 -10351347.864267] FAST spr round 12 (radius: 25)
- after brlen: logLH = -10351344.716538
[01:59:22 -10351344.716538] FAST spr round 13 (radius: 25)
- after brlen: logLH = -10351341.645818
[02:14:59 -10351341.645818] FAST spr round 14 (radius: 25)
- after brlen: logLH = -10351338.892258
[02:30:15 -10351338.892258] FAST spr round 15 (radius: 25)
- after brlen: logLH = -10351335.542908
[02:45:41 -10351335.542908] FAST spr round 16 (radius: 25)
- after brlen: logLH = -10351333.301802
[03:01:20 -10351333.301802] FAST spr round 17 (radius: 25)
- after brlen: logLH = -10351331.151827
[03:16:44 -10351331.151827] FAST spr round 18 (radius: 25)
- after brlen: logLH = -10351329.341439
[03:32:10 -10351329.341439] FAST spr round 19 (radius: 25)
- after brlen: logLH = -10351327.535476
[03:47:30 -10351327.535476] FAST spr round 20 (radius: 25)
- after brlen: logLH = -10351322.997378
[04:02:53 -10351322.997378] FAST spr round 21 (radius: 25)
- after brlen: logLH = -10351313.092002
[04:18:16 -10351313.092002] FAST spr round 22 (radius: 25)
- after brlen: logLH = -10351311.292615
[04:33:31 -10351311.292615] FAST spr round 23 (radius: 25)
- after brlen: logLH = -10351309.493888
[04:48:49 -10351309.493888] FAST spr round 24 (radius: 25)
- after brlen: logLH = -10351307.726005
[05:04:22 -10351307.726005] FAST spr round 25 (radius: 25)
- after brlen: logLH = -10351306.007903
[05:19:51 -10351306.007903] FAST spr round 26 (radius: 25)
- after brlen: logLH = -10351304.635382
[05:35:13 -10351304.635382] FAST spr round 27 (radius: 25)
- after brlen: logLH = -10351299.854038
[05:50:37 -10351299.854038] FAST spr round 28 (radius: 25)
- after brlen: logLH = -10351298.535978
[06:05:53 -10351298.535978] FAST spr round 29 (radius: 25)
- after brlen: logLH = -10351297.304712
[06:21:07 -10351297.304712] FAST spr round 30 (radius: 25)
- after brlen: logLH = -10351296.251681
[06:36:27 -10351296.251681] FAST spr round 31 (radius: 25)
- after brlen: logLH = -10351292.036071
[06:51:44 -10351292.036071] FAST spr round 32 (radius: 25)
- after brlen: logLH = -10351291.153234
[07:07:18 -10351291.153234] FAST spr round 33 (radius: 25)
- after brlen: logLH = -10351290.077628
[07:22:43 -10351290.077628] FAST spr round 34 (radius: 25)
- after brlen: logLH = -10351289.614201
[07:37:57 -10351289.614201] FAST spr round 35 (radius: 25)
- after brlen: logLH = -10351279.200180
[07:53:15 -10351279.200180] FAST spr round 36 (radius: 25)
- after brlen: logLH = -10351276.008304
[08:08:31 -10351276.008304] FAST spr round 37 (radius: 25)
- after brlen: logLH = -10351275.696520
[08:24:02 -10351275.696520] FAST spr round 38 (radius: 25)
- after brlen: logLH = -10351275.476474
[08:39:26 -10351275.476474] FAST spr round 39 (radius: 25)
- after brlen: logLH = -10351260.626399
[08:55:03 -10351260.626399] FAST spr round 40 (radius: 25)
- after brlen: logLH = -10351258.278455
[09:10:30 -10351258.278455] FAST spr round 41 (radius: 25)
- after brlen: logLH = -10351257.413502
[09:25:46 -10351257.413502] FAST spr round 42 (radius: 25)
- after brlen: logLH = -10351257.122438
[09:41:02 -10351257.122438] FAST spr round 43 (radius: 25)
- after brlen: logLH = -10351257.017756
[09:56:22 -10351257.017756] FAST spr round 44 (radius: 25)
- after brlen: logLH = -10351257.017756
[10:11:41 -10351257.017756] Model parameter optimization (eps = 1.000000)
- after rates: logLH = -10349776.244735
- after freqs: logLH = -10349682.139791
- after alpha: logLH = -10349574.462679
- after p-inv: logLH = -10349574.462679
- after freeR: logLH = -10349574.462679
- after brlen: logLH = -10349534.314637
Iteration 1: logLH = -10349534.314637
- after rates: logLH = -10349380.046801
- after freqs: logLH = -10349299.910029
- after alpha: logLH = -10349299.639095
- after p-inv: logLH = -10349299.639095
- after freeR: logLH = -10349299.639095
- after brlen: logLH = -10349233.591349
Iteration 2: logLH = -10349233.591349
- after rates: logLH = -10349220.929805
- after freqs: logLH = -10349197.990022
- after alpha: logLH = -10349188.612289
- after p-inv: logLH = -10349188.612289
- after freeR: logLH = -10349188.612289
- after brlen: logLH = -10349157.719741
Iteration 3: logLH = -10349157.719741
- after rates: logLH = -10349154.688967
- after freqs: logLH = -10349152.227538
- after alpha: logLH = -10349148.468388
- after p-inv: logLH = -10349148.468388
- after freeR: logLH = -10349148.468388
- after brlen: logLH = -10349140.074094
Iteration 4: logLH = -10349140.074094
- after rates: logLH = -10349079.391843
- after freqs: logLH = -10349062.172494
- after alpha: logLH = -10349062.172494
- after p-inv: logLH = -10349062.172494
- after freeR: logLH = -10349062.172494
- after brlen: logLH = -10349055.367669
Iteration 5: logLH = -10349055.367669
- after rates: logLH = -10349051.747483
- after freqs: logLH = -10349049.803986
- after alpha: logLH = -10349049.803986
- after p-inv: logLH = -10349049.803986
- after freeR: logLH = -10349049.803986
- after brlen: logLH = -10349047.915509
Iteration 6: logLH = -10349047.915509
- after rates: logLH = -10349001.372825
- after freqs: logLH = -10348996.709057
- after alpha: logLH = -10348995.707157
- after p-inv: logLH = -10348995.707157
- after freeR: logLH = -10348995.707157
- after brlen: logLH = -10348993.074776
Iteration 7: logLH = -10348993.074776
- after rates: logLH = -10348989.464290
- after freqs: logLH = -10348988.581751
- after alpha: logLH = -10348987.916689
- after p-inv: logLH = -10348987.916689
- after freeR: logLH = -10348987.916689
- after brlen: logLH = -10348986.053077
Iteration 8: logLH = -10348986.053077
- after rates: logLH = -10348985.145344
- after freqs: logLH = -10348984.964685
- after alpha: logLH = -10348984.952975
- after p-inv: logLH = -10348984.952975
- after freeR: logLH = -10348984.952975
- after brlen: logLH = -10348983.522732
Iteration 9: logLH = -10348983.522732
- after rates: logLH = -10348983.200599
- after freqs: logLH = -10348983.157370
- after alpha: logLH = -10348983.157370
- after p-inv: logLH = -10348983.157370
- after freeR: logLH = -10348983.157370
- after brlen: logLH = -10348983.006569
Iteration 10: logLH = -10348983.006569
[10:14:26 -10348983.006569] SLOW spr round 1 (radius: 5)
- after brlen: logLH = -10347710.411143
[10:39:31 -10347710.411143] SLOW spr round 2 (radius: 5)
- after brlen: logLH = -10347562.616680
[11:04:07 -10347562.616680] SLOW spr round 3 (radius: 5)
- after brlen: logLH = -10347438.191367
[11:28:24 -10347438.191367] SLOW spr round 4 (radius: 5)
- after brlen: logLH = -10347363.209677
[11:52:23 -10347363.209677] SLOW spr round 5 (radius: 5)
- after brlen: logLH = -10347352.031868
[12:15:43 -10347352.031868] SLOW spr round 6 (radius: 5)
- after brlen: logLH = -10347350.366705
[12:38:54 -10347350.366705] SLOW spr round 7 (radius: 5)
- after brlen: logLH = -10347345.307373
[13:02:06 -10347345.307373] SLOW spr round 8 (radius: 5)
- after brlen: logLH = -10347343.897947
[13:24:47 -10347343.897947] SLOW spr round 9 (radius: 5)
- after brlen: logLH = -10347342.808666
[13:47:44 -10347342.808666] SLOW spr round 10 (radius: 5)
- after brlen: logLH = -10347338.980500
[14:11:03 -10347338.980500] SLOW spr round 11 (radius: 5)
- after brlen: logLH = -10347338.980500
[14:34:15 -10347338.980500] SLOW spr round 12 (radius: 10)
- after brlen: logLH = -10347050.078766
[14:59:52 -10347050.078766] SLOW spr round 13 (radius: 5)
- after brlen: logLH = -10346990.991364
[15:29:43 -10346990.991364] SLOW spr round 14 (radius: 5)
- after brlen: logLH = -10346904.424879
[15:56:10 -10346904.424879] SLOW spr round 15 (radius: 5)
- after brlen: logLH = -10346883.152208
[16:20:15 -10346883.152208] SLOW spr round 16 (radius: 5)
- after brlen: logLH = -10346875.243318
[16:43:17 -10346875.243318] SLOW spr round 17 (radius: 5)
- after brlen: logLH = -10346874.840574
[17:06:05 -10346874.840574] SLOW spr round 18 (radius: 5)
- after brlen: logLH = -10346872.440384
[17:29:03 -10346872.440384] SLOW spr round 19 (radius: 5)
- after brlen: logLH = -10346872.440384
[17:51:37 -10346872.440384] SLOW spr round 20 (radius: 10)
- after brlen: logLH = -10346858.054793
[18:15:44 -10346858.054793] SLOW spr round 21 (radius: 5)
- after brlen: logLH = -10346804.247108
[18:44:54 -10346804.247108] SLOW spr round 22 (radius: 5)
- after brlen: logLH = -10346804.247108
[19:10:39 -10346804.247108] SLOW spr round 23 (radius: 10)
- after brlen: logLH = -10346761.657173
[19:36:35 -10346761.657173] SLOW spr round 24 (radius: 5)
- after brlen: logLH = -10346701.226583
[20:05:56 -10346701.226583] SLOW spr round 25 (radius: 5)
- after brlen: logLH = -10346700.950547
[20:31:23 -10346700.950547] SLOW spr round 26 (radius: 5)
- after brlen: logLH = -10346700.950547
[20:54:48 -10346700.950547] SLOW spr round 27 (radius: 10)
- after brlen: logLH = -10346663.314801
[21:20:00 -10346663.314801] SLOW spr round 28 (radius: 5)
- after brlen: logLH = -10346617.223015
[21:49:52 -10346617.223015] SLOW spr round 29 (radius: 5)
- after brlen: logLH = -10346617.223015
[22:15:01 -10346617.223015] SLOW spr round 30 (radius: 10)
- after brlen: logLH = -10346614.904506
[22:40:49 -10346614.904506] SLOW spr round 31 (radius: 5)
- after brlen: logLH = -10346524.197975
[23:10:32 -10346524.197975] SLOW spr round 32 (radius: 5)
- after brlen: logLH = -10346522.361354
[23:35:44 -10346522.361354] SLOW spr round 33 (radius: 5)
- after brlen: logLH = -10346522.361354
[23:59:23 -10346522.361354] SLOW spr round 34 (radius: 10)
- after brlen: logLH = -10346522.361354
[24:24:20 -10346522.361354] SLOW spr round 35 (radius: 15)
- after brlen: logLH = -10346440.916789
[25:10:31 -10346440.916789] SLOW spr round 36 (radius: 5)
- after brlen: logLH = -10346417.940650
[25:41:11 -10346417.940650] SLOW spr round 37 (radius: 5)
- after brlen: logLH = -10346417.940650
[26:07:39 -10346417.940650] SLOW spr round 38 (radius: 10)
- after brlen: logLH = -10346417.940650
[26:34:23 -10346417.940650] SLOW spr round 39 (radius: 15)
- after brlen: logLH = -10346383.511388
[27:18:57 -10346383.511388] SLOW spr round 40 (radius: 5)
- after brlen: logLH = -10346363.066663
[27:49:52 -10346363.066663] SLOW spr round 41 (radius: 5)
- after brlen: logLH = -10346363.066663
[28:16:19 -10346363.066663] SLOW spr round 42 (radius: 10)
- after brlen: logLH = -10346363.066663
[28:43:06 -10346363.066663] SLOW spr round 43 (radius: 15)
- after brlen: logLH = -10346353.271026
[29:27:54 -10346353.271026] SLOW spr round 44 (radius: 5)
- after brlen: logLH = -10346350.970242
[29:58:42 -10346350.970242] SLOW spr round 45 (radius: 5)
- after brlen: logLH = -10346324.073651
[30:25:39 -10346324.073651] SLOW spr round 46 (radius: 5)
- after brlen: logLH = -10346300.343150
[30:50:35 -10346300.343150] SLOW spr round 47 (radius: 5)
- after brlen: logLH = -10346299.867960
[31:13:45 -10346299.867960] SLOW spr round 48 (radius: 5)
- after brlen: logLH = -10346299.867960
[31:36:29 -10346299.867960] SLOW spr round 49 (radius: 10)
- after brlen: logLH = -10346299.867960
[32:00:48 -10346299.867960] SLOW spr round 50 (radius: 15)
- after brlen: logLH = -10346299.867960
[32:48:09 -10346299.867960] SLOW spr round 51 (radius: 20)
- after brlen: logLH = -10346299.867960
[34:04:05 -10346299.867960] SLOW spr round 52 (radius: 25)
- after brlen: logLH = -10346299.867960
[35:42:37 -10346299.867960] Model parameter optimization (eps = 0.100000)
- after rates: logLH = -10346299.698036
- after freqs: logLH = -10346299.396250
- after alpha: logLH = -10346299.396250
- after p-inv: logLH = -10346299.396250
- after freeR: logLH = -10346299.396250
- after brlen: logLH = -10346299.331205
Iteration 1: logLH = -10346299.331205
- after rates: logLH = -10346299.304296
- after freqs: logLH = -10346299.290322
- after alpha: logLH = -10346299.290322
- after p-inv: logLH = -10346299.290322
- after freeR: logLH = -10346299.290322
- after brlen: logLH = -10346299.268211
Iteration 2: logLH = -10346299.268211
[35:42:51] ML tree search #1, logLikelihood: -10346299.268211
[35:42:52 -358718771.056143] Initial branch length optimization
- after brlen: logLH = -0.000000
terminate called recursively
terminate called recursively
/var/spool/slurm/d/job13733182/slurm_script: line 29: 63733 Aborted (core dumped) raxml-ng --all --msa SNP.parse.raxml.rba --model GTR+G+ASC_FELS{$invar_sites} --prefix SNP --threads 24 --workers 1 --bs-trees 100 --log debug
I didn't set up a way of tracking the memory usage throughout the job, but at least in the beginning, it was using around a quarter of the available memory (~200G) on the system.
(The very first run I did had access to 1 TB of memory, but I ran out my cpu hour allocation on that system)
Hi, I know this overlapped with the holiday season, but I wanted to make sure it isn't forgotten. My allocation on the cluster expired and I'll see if we can get a new one, but I did manage to get that debug output for you before that happened.
many thanks for all your help
Thank you, this debug output was very helpful.
Here are a couple of things you can try:
- Add
--redo
to ignore checkpoint and start a fresh run, - Add
--lh-epsilon 10
to accelerate the tree search (convergence is very slow here, which is not uncommon on datasets with thousands of taxa) - Add
--blopt nr_safe
option (this enables additional checks which makes inference slower but more robust)
As a side note. according to the log file, your system has 128 cores and 250GB of memory. This means that: a) 200GB would be pretty close to the limit, b) you can probably use more than 24 threads (if despite the above, it turns out memory is not the constraint).
sorry for the delay, but I've gotten renewed access to the cluster and just run it with these three settings. Now I've gotten a new error right before the core dump:
RAxML-NG v. 1.1.0 released on 29.11.2021 by The Exelixis Lab.
Developed by: Alexey M. Kozlov and Alexandros Stamatakis.
Contributors: Diego Darriba, Tomas Flouri, Benoit Morel, Sarah Lutteropp, Ben Bettisworth.
Latest version: https://github.com/amkozlov/raxml-ng
Questions/problems/suggestions? Please visit: https://groups.google.com/forum/#!forum/raxml
System: AMD EPYC 7742 64-Core Processor, 128 cores, 251 GB RAM
RAxML-NG was called at 01-Feb-2023 13:05:50 as follows:
raxml-ng --redo --lh-epsilon 10 --blopt nr_safe --all --msa SNP.parse.raxml.rba --model GTR+G+ASC_FELS{4087123} --prefix SNP --threads 24 --workers 1 --bs-trees 100 --log debug
Analysis options:
run mode: ML tree search + bootstrapping (Felsenstein Bootstrap)
start tree(s): random (10) + parsimony (10)
bootstrap replicates: 100
random seed: 1675274750
tip-inner: OFF
pattern compression: ON
per-rate scalers: OFF
site repeats: ON
branch lengths: proportional (ML estimate, algorithm: NR-SAFE)
SIMD kernels: AVX2
parallelization: PTHREADS (24 threads), thread pinning: OFF
RBA partial loading: OFF
WARNING: Running in REDO mode: existing checkpoints are ignored, and all result files will be overwritten!
WARNING: The model you specified on the command line (GTR+G+ASC_FELS{4087123}) will be ignored
since the binary MSA file already contains a model definition.
If you want to change the model, please re-run RAxML-NG
with the original PHYLIP/FASTA alignment and --redo option.
[00:00:00] Loading binary alignment from file: SNP.parse.raxml.rba
[00:00:04] Alignment comprises 4832 taxa, 1 partitions and 82354 patterns
Partition 0: noname
Model: GTR+FO+G4m+ASC_FELS{4087123}
Alignment sites / patterns: 324409 / 82354
Gaps: 0.00 %
Invariant sites: 0.00 %
|noname| |GTR+FO+G4m+ASC_FELS{4087123}| |1-324409|
NOTE: Per-rate scalers were automatically enabled to prevent numerical issues on taxa-rich alignments.
NOTE: You can use --force switch to skip this check and fall back to per-site scalers.
Generating a random starting tree with 4832 taxa
Parallel reduction/worker buffer size: 1 KB / 0 KB
[00:00:04] Generating 10 random starting tree(s) with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
[00:00:04] Generating 10 parsimony starting tree(s) with 4832 taxa
Generating a parsimony starting tree with 4832 taxa
Parsimony seed: 1558950843
Parsimony score of the starting tree: 940193
Generating a parsimony starting tree with 4832 taxa
Parsimony seed: 848384151
Parsimony score of the starting tree: 939019
Generating a parsimony starting tree with 4832 taxa
Parsimony seed: 1052997793
Parsimony score of the starting tree: 939156
Generating a parsimony starting tree with 4832 taxa
Parsimony seed: 400527839
Parsimony score of the starting tree: 938570
Generating a parsimony starting tree with 4832 taxa
Parsimony seed: 1112046720
Parsimony score of the starting tree: 938814
Generating a parsimony starting tree with 4832 taxa
Parsimony seed: 222282738
Parsimony score of the starting tree: 939227
Generating a parsimony starting tree with 4832 taxa
Parsimony seed: 297541634
Parsimony score of the starting tree: 940208
Generating a parsimony starting tree with 4832 taxa
Parsimony seed: 1044300830
Parsimony score of the starting tree: 938393
Generating a parsimony starting tree with 4832 taxa
Parsimony seed: 1611091263
Parsimony score of the starting tree: 939272
Generating a parsimony starting tree with 4832 taxa
Parsimony seed: 570470538
Parsimony score of the starting tree: 940064
Initial model parameters:
Partition: noname
Rate heterogeneity: GAMMA (4 cats, mean), alpha: 1.000000 (ML), weights&rates: (0.250000,0.136954) (0.250000,0.476752) (0.250000,1.000000) (0.250000,2.386294)
Base frequencies (ML): 0.250000 0.250000 0.250000 0.250000
Substitution rates (ML): 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
[00:47:05] Data distribution: max. partitions/sites/weight per thread: 1 / 3432 / 54912
thread# part# start length weight
0 0 78936 3418 54688
1 0 75504 3432 54912
2 0 72072 3432 54912
3 0 68640 3432 54912
4 0 65208 3432 54912
5 0 61776 3432 54912
6 0 58344 3432 54912
7 0 54912 3432 54912
8 0 51480 3432 54912
9 0 48048 3432 54912
10 0 44616 3432 54912
11 0 41184 3432 54912
12 0 37752 3432 54912
13 0 34320 3432 54912
14 0 30888 3432 54912
15 0 27456 3432 54912
16 0 24024 3432 54912
17 0 20592 3432 54912
18 0 17160 3432 54912
19 0 13728 3432 54912
20 0 10296 3432 54912
21 0 6864 3432 54912
22 0 3432 3432 54912
23 0 0 3432 54912
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
Generating a random starting tree with 4832 taxa
[00:47:05] Data distribution: max. searches per worker: 120
BARRIER time: 0.007015 ms
Starting ML tree search with 20 distinct starting trees
[00:47:09 -614172163.674568] Initial branch length optimization
- after brlen: logLH = -0.000000
ERROR: ERROR in branch length optimization (LIBPLL-2210): Wrong likelihood derivatives
terminate called recursively
terminate called recursively
/var/spool/slurm/d/job14250675/slurm_script: line 32: 93657 Aborted (core dumped) raxml-ng --redo --lh-epsilon 10 --blopt nr_safe --all --msa SNP.parse.raxml.rba --model GTR+G+ASC_FELS{$invar_sites} --prefix SNP --threads 24 --workers 1 --bs-trees 100 --log debug
Sorry for the delay! Since I can't see any obvious reasons for this error, could you please send me you input file?
(Even better if you could reproduce this error on a smaller alignment)
Hi, I also see the same problem, Did you solve this problem?
Please send me your input files and full raxml log, and I will have a look.