[BUG] MPI errors when running NEB with multiple cores per replica

Question

[BUG] MPI errors when running NEB with multiple cores per replica

jcappola opened this issue 4 months ago · 5 comments

Summary

Running NEB with multiple cores per replica, such as
mpirun -np 8 lmp -partition 4x2 -in in.neb.sivac leads to a few different MPI-related crashes.

LAMMPS Version and Platform

LAMMPS (2 Aug 2023) - Update 2 running on Ubuntu 20.04.6 LTS compiled with Open-MPI 4.0.3 including the MANYBODY and REPLICA package

Expected Behavior

The free-end NEB end commands should (ideally) all allow for multiple cores per replica to be run. At the moment, we are limited to a single core per replica for end last/efirst and end last/efirst/middle.

Actual Behavior

When running NEB with multiple cores per replica, e.g.,

mpirun -np 8 lmp -partition 4x2 -in in.neb.sivac

LAMMPS will return an error if no verbosity argument is provided to the neb command:

 *** An error occurred in MPI_Allgather
 *** reported by process [2652307457,4]
 *** on communicator MPI COMMUNICATOR 4 SPLIT FROM 0
 *** MPI_ERR_TRUNCATE: message truncated
 *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
 ***    and potentially your MPI job)

This issue is immediately fixed by adding any verbosity argument to the neb command.

Separately, when trying to use free-end NEB (particularly only end last/efirst and end last/efirst/middle) with multiple cores per replica will return a different MPI error:

 *** An error occurred in MPI_Bcast
 *** reported by process [2639265793,2]
 *** on communicator MPI_COMM_WORLD
 *** MPI_ERR_COMM: invalid communicator
 *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
 ***    and potentially your MPI job)

which completely prevents the use of multiple cores per replica for more complicated free-end NEB calculations. Note that this issue does not occur for end first or end last. This has been an constant issue on multiple platforms and installations.

Steps to Reproduce

After a clean install of LAMMPS (2 Aug 2023) - Update 2 including the MANYBODY and REPLICA package, you can run the example/neb case in.neb.sivac with the correct partition command to replicate these issues.

Further Information, Files, and Links

It seems that the behavior for the first error is fixed by just adding the verbosity keyword to the neb command, although it is not clear to me as to why. There is only a single MPI_Allgather that a NEB script running with the default print_mode should encounter which is on line 642 of neb.cpp.

The second error seems to be fixed by changing the communicator in line 302 of fix_neb.cpp from:

 if (me == 0) MPI_Bcast(&vIni, 1, MPI_DOUBLE, 0, rootworld);

to:

 if (me == 0) MPI_Bcast(&vIni, 1, MPI_DOUBLE, 0, uworld);

although this is my first time poking around the LAMMPS source code so I don't know if this is the correct/desired fix.

Answer 1 · 2024-01-20T14:46:11.000Z

There is no "end" keyword in the "neb" command: https://docs.lammps.org/neb.html

So please provide the exact input files that cause the issue and the corresponding log or screen output files for replica 0.
I have no problems with the provided example using either OpenMPI 4.1.5 or MPICH 4.1.2 with a 4x2 partition.
I've tested the last stable and the current development version.

Answer 2 · 2024-01-20T17:30:54.000Z

Hi @akohlmey,

I am aware that the "end" keyword is under the "fix neb" command, sorry for any confusion.

On a clean install of LAMMPS (2 Aug 2023) - Update 2, I run the "example/neb/in.neb.sivac" case with:

mpirun -np 8 lmp -partition 4x2 -in in.neb.sivac

where the NEB region of that original example script looks like this:

fix             1 all neb 1.0

thermo          100

# run NEB for 2000 steps or to force tolerance

timestep        0.01
min_style       quickmin

neb             0.0 0.01 100 100 10 final final.sivac

and I get the following error print to the screen:

[jcappola-Precision-Tower-7910:07055] *** An error occurred in MPI_Allgather
[jcappola-Precision-Tower-7910:07055] *** reported by process [4100849665,4]
[jcappola-Precision-Tower-7910:07055] *** on communicator MPI COMMUNICATOR 4 SPLIT FROM 0
[jcappola-Precision-Tower-7910:07055] *** MPI_ERR_TRUNCATE: message truncated
[jcappola-Precision-Tower-7910:07055] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[jcappola-Precision-Tower-7910:07055] ***    and potentially your MPI job)

All of the screen.* files are blank and the log.lammps.* file for partition 0 looks like this:

 LAMMPS (2 Aug 2023)
Processor partition = 0
OMP_NUM_THREADS environment is not set. Defaulting to 1 thread. (src/comm.cpp:98)
  using 1 OpenMP thread(s) per MPI task
# NEB simulation of vacancy hopping in silicon crystal

units           metal

atom_style      atomic
atom_modify     map array
boundary        p p p
atom_modify     sort 0 0.0

# coordination number cutoff

variable r equal 2.835

# diamond unit cell

variable a equal 5.431
lattice         custom $a                               a1 1.0 0.0 0.0                          a2 0.0 1.0 0.0                          a3 0.0 0.0 1.0                          basis 0.0 0.0 0.0                       basis 0.0 0.5 0.5                       basis 0.5 0.0 0.5                       basis 0.5 0.5 0.0                       basis 0.25 0.25 0.25                    basis 0.25 0.75 0.75                    basis 0.75 0.25 0.75                    basis 0.75 0.75 0.25
lattice         custom 5.431                               a1 1.0 0.0 0.0                          a2 0.0 1.0 0.0                          a3 0.0 0.0 1.0                          basis 0.0 0.0 0.0                       basis 0.0 0.5 0.5                       basis 0.5 0.0 0.5                       basis 0.5 0.5 0.0                       basis 0.25 0.25 0.25                    basis 0.25 0.75 0.75                    basis 0.75 0.25 0.75                    basis 0.75 0.75 0.25
Lattice spacing in x,y,z = 5.431 5.431 5.431

region          myreg block     0 4                                 0 4                                 0 4

#create_box      1 myreg
#create_atoms    1 region myreg
#mass            1       28.06
#write_data      initial.sivac

read_data       initial.sivac
Reading data file ...
  orthogonal box = (0 0 0) to (21.724 21.724 21.724)
  1 by 1 by 2 MPI processor grid
  reading atoms ...
  512 atoms
  reading velocities ...
  512 velocities
  read_data CPU = 0.010 seconds

# make a vacancy

group Si type 1
512 atoms in group Si

group del id 300
1 atoms in group del
delete_atoms group del compress no
Deleted 1 atoms, new total = 511
group vacneigh id 174 175 301 304 306 331 337
7 atoms in group vacneigh

# choose potential

pair_style      sw
pair_coeff * * Si.sw Si
Reading sw potential file Si.sw with DATE: 2007-06-11

# set up neb run

variable        u uloop 20

# initial minimization to relax vacancy

displace_atoms all random 0.1 0.1 0.1 123456
Displacing atoms ...
minimize        1.0e-6 1.0e-4 1000 10000
Neighbor list info ...
  update: every = 1 steps, delay = 0 steps, check = yes
  max neighbors/atom: 2000, page size: 100000
  master list distance cutoff = 5.77118
  ghost atom cutoff = 5.77118
  binsize = 2.88559, bins = 8 8 8
  1 neighbor lists, perpetual/occasional/extra = 1 0 0
  (1) pair sw, perpetual
      attributes: full, newton on
      pair build: full/bin/atomonly
      stencil: full/bin/3d
      bin: standard
Per MPI rank memory allocation (min/avg/max) = 4.113 | 4.114 | 4.114 Mbytes
   Step          Temp          E_pair         E_mol          TotEng         Press     
         0   0             -593.40319      0             -593.40319      355294.02    
        33   0             -2213.3343      0             -2213.3343     -3383.2606    
Loop time of 0.042342 on 2 procs for 33 steps with 511 atoms

97.7% CPU use with 2 MPI tasks x 1 OpenMP threads

Minimization stats:
  Stopping criterion = energy tolerance
  Energy initial, next-to-last, final = 
     -593.403188091472  -2213.33209897182  -2213.33426537417
  Force two-norm initial, final = 1101.8254 0.16683659
  Force max component initial, final = 334.49264 0.014961353
  Final line search alpha, max atom move = 1 0.014961353
  Iterations, force evaluations = 33 44

MPI task timing breakdown:
Section |  min time  |  avg time  |  max time  |%varavg| %total
---------------------------------------------------------------
Pair    | 0.037702   | 0.037902   | 0.038102   |   0.1 | 89.51
Neigh   | 0.00096726 | 0.00098613 | 0.001005   |   0.0 |  2.33
Comm    | 0.0021386  | 0.0023585  | 0.0025785  |   0.5 |  5.57
Output  | 0          | 0          | 0          |   0.0 |  0.00
Modify  | 0          | 0          | 0          |   0.0 |  0.00
Other   |            | 0.001095   |            |       |  2.59

Nlocal:          255.5 ave         258 max         253 min
Histogram: 1 0 0 0 0 0 0 0 0 1
Nghost:         1080.5 ave        1083 max        1078 min
Histogram: 1 0 0 0 0 0 0 0 0 1
Neighs:              0 ave           0 max           0 min
Histogram: 2 0 0 0 0 0 0 0 0 0
FullNghs:         8689 ave        8777 max        8601 min
Histogram: 1 0 0 0 0 0 0 0 0 1

Total # of neighbors = 17378
Ave neighs/atom = 34.007828
Neighbor list builds = 1
Dangerous builds = 0

All the other log.lammps.* files are identical except for the timings on the CPUs. Note that 2 MPI tasks were reported on each partition here.

When I then make the modification to the "in.neb.sivac" input script as below:

fix             1 all neb 1.0

thermo          100

# run NEB for 2000 steps or to force tolerance

timestep        0.01
min_style       quickmin

neb             0.0 0.01 100 100 10 final final.sivac verbosity default

I get the expected output to the screen:

LAMMPS (2 Aug 2023)
Running on 4 partitions of processors
Reading NEB coordinate file(s) ...
Setting up regular NEB ...
    Step     MaxReplicaForce  MaxAtomForce      GradV0         GradV1         GradVc          EBF            EBR            RDT            RD1            PE1            RD2            PE2            RD3            PE3            RD4            PE4       
         0   7.5525391        2.671788       0.16683659     7.5525391      7.5525391      1.5383951      0              1.6207355      0              -2213.3343     0.33333333     -2212.7428     0.66666667     -2212.2247     1              -2211.7959     
        10   0.24005275       0.0013324036   0.036483049    0.24005275     0.68351722     0.42916118     0.41794425     1.6989349      0              -2213.3365     0.32909183     -2212.9587     0.65386736     -2212.9073     1              -2213.3253     
        20   0.07940898       0.00026889621  0.024706844    0.07940898     0.71637784     0.41387872     0.41157886     1.7343662      0              -2213.3369     0.32478734     -2212.9621     0.65348766     -2212.923      1              -2213.3346     
        30   0.094973708      6.9942581e-05  0.015145947    0.035267404    0.7535772      0.40072717     0.40024605     1.7504612      0              -2213.3372     0.32705584     -2212.9584     0.65894506     -2212.9365     1              -2213.3367     
        40   0.027727472      1.9827556e-05  0.011618173    0.022562656    0.76133752     0.39614635     0.39591731     1.7547519      0              -2213.3373     0.32873163     -2212.9562     0.66124255     -2212.9411     1              -2213.337      
        50   0.01942934       9.0662902e-06  0.0087135565   0.015391975    0.7695268      0.39274846     0.3926388      1.7578616      0              -2213.3373     0.33022595     -2212.9543     0.66307279     -2212.9446     1              -2213.3372     
        60   0.019056184      2.6462904e-06  0.0053426943   0.0086167383   0.77759662     0.38936868     0.38933371     1.7610433      0              -2213.3374     0.33187545     -2212.9523     0.66497614     -2212.948      1              -2213.3373     
        63   0.0097002883     1.6300083e-06  0.0047744861   0.0076067229   0.77865612     0.38888517     0.38885789     1.7615322      0              -2213.3374     0.3321224      -2212.952      0.66525533     -2212.9485     1              -2213.3373     
Setting up climbing ...
Climbing replica = 3
    Step     MaxReplicaForce  MaxAtomForce      GradV0         GradV1         GradVc          EBF            EBR            RDT            RD1            PE1            RD2            PE2            RD3            PE3            RD4            PE4       
        63   0.77865612       0.09663051     0.0047744861   0.0076067229   0.77865612     0.38888517     0.38885789     1.7615322      0              -2213.3374     0.3321224      -2212.952      0.66525533     -2212.9485     1              -2213.3373     
        73   0.098996478      0.0011347237   0.0027942177   0.0042838869   0.038660264    0.51024721     0.51023862     1.7607154      0              -2213.3374     0.27601803     -2213.0412     0.50460604     -2212.8271     1              -2213.3374     
        83   0.032624413      0.00016414467  0.0020866205   0.0031623348   0.01014434     0.5101467      0.51014204     1.7602604      0              -2213.3374     0.26052902     -2213.0671     0.50359805     -2212.8272     1              -2213.3374     
        93   0.011192577      1.2896354e-05  0.0014872247   0.0022320888   0.0057972393   0.5101135      0.51011119     1.7601271      0              -2213.3374     0.25446194     -2213.0775     0.50382575     -2212.8273     1              -2213.3374     
        96   0.0085623418     6.17654e-06    0.0013462454   0.0020157414   0.0050121139   0.51010903     0.51010716     1.7601216      0              -2213.3374     0.25374901     -2213.0787     0.50391708     -2212.8273     1              -2213.3374

and the corresponding log file for partition 0 looks like this:

LAMMPS (2 Aug 2023)
Processor partition = 0
OMP_NUM_THREADS environment is not set. Defaulting to 1 thread. (src/comm.cpp:98)
  using 1 OpenMP thread(s) per MPI task
# NEB simulation of vacancy hopping in silicon crystal

units           metal

atom_style      atomic
atom_modify     map array
boundary        p p p
atom_modify     sort 0 0.0

# coordination number cutoff

variable r equal 2.835

# diamond unit cell

variable a equal 5.431
lattice         custom $a                               a1 1.0 0.0 0.0                          a2 0.0 1.0 0.0                          a3 0.0 0.0 1.0                          basis 0.0 0.0 0.0                       basis 0.0 0.5 0.5                       basis 0.5 0.0 0.5                       basis 0.5 0.5 0.0                       basis 0.25 0.25 0.25                    basis 0.25 0.75 0.75                    basis 0.75 0.25 0.75                    basis 0.75 0.75 0.25
lattice         custom 5.431                               a1 1.0 0.0 0.0                          a2 0.0 1.0 0.0                          a3 0.0 0.0 1.0                          basis 0.0 0.0 0.0                       basis 0.0 0.5 0.5                       basis 0.5 0.0 0.5                       basis 0.5 0.5 0.0                       basis 0.25 0.25 0.25                    basis 0.25 0.75 0.75                    basis 0.75 0.25 0.75                    basis 0.75 0.75 0.25
Lattice spacing in x,y,z = 5.431 5.431 5.431

region          myreg block     0 4                                 0 4                                 0 4

#create_box      1 myreg
#create_atoms    1 region myreg
#mass            1       28.06
#write_data      initial.sivac

read_data       initial.sivac
Reading data file ...
  orthogonal box = (0 0 0) to (21.724 21.724 21.724)
  1 by 1 by 2 MPI processor grid
  reading atoms ...
  512 atoms
  reading velocities ...
  512 velocities
  read_data CPU = 0.010 seconds

# make a vacancy

group Si type 1
512 atoms in group Si

group del id 300
1 atoms in group del
delete_atoms group del compress no
Deleted 1 atoms, new total = 511
group vacneigh id 174 175 301 304 306 331 337
7 atoms in group vacneigh

# choose potential

pair_style      sw
pair_coeff * * Si.sw Si
Reading sw potential file Si.sw with DATE: 2007-06-11

# set up neb run

variable        u uloop 20

# initial minimization to relax vacancy

displace_atoms all random 0.1 0.1 0.1 123456
Displacing atoms ...
minimize        1.0e-6 1.0e-4 1000 10000
Neighbor list info ...
  update: every = 1 steps, delay = 0 steps, check = yes
  max neighbors/atom: 2000, page size: 100000
  master list distance cutoff = 5.77118
  ghost atom cutoff = 5.77118
  binsize = 2.88559, bins = 8 8 8
  1 neighbor lists, perpetual/occasional/extra = 1 0 0
  (1) pair sw, perpetual
      attributes: full, newton on
      pair build: full/bin/atomonly
      stencil: full/bin/3d
      bin: standard
Per MPI rank memory allocation (min/avg/max) = 4.113 | 4.114 | 4.114 Mbytes
   Step          Temp          E_pair         E_mol          TotEng         Press     
         0   0             -593.40319      0             -593.40319      355294.02    
        33   0             -2213.3343      0             -2213.3343     -3383.2606    
Loop time of 0.0423295 on 2 procs for 33 steps with 511 atoms

97.9% CPU use with 2 MPI tasks x 1 OpenMP threads

Minimization stats:
  Stopping criterion = energy tolerance
  Energy initial, next-to-last, final = 
     -593.403188091472  -2213.33209897182  -2213.33426537417
  Force two-norm initial, final = 1101.8254 0.16683659
  Force max component initial, final = 334.49264 0.014961353
  Final line search alpha, max atom move = 1 0.014961353
  Iterations, force evaluations = 33 44

MPI task timing breakdown:
Section |  min time  |  avg time  |  max time  |%varavg| %total
---------------------------------------------------------------
Pair    | 0.023435   | 0.030794   | 0.038153   |   4.2 | 72.75
Neigh   | 0.00067653 | 0.00082372 | 0.00097091 |   0.0 |  1.95
Comm    | 0.0021367  | 0.0096178  | 0.017099   |   7.6 | 22.72
Output  | 0          | 0          | 0          |   0.0 |  0.00
Modify  | 0          | 0          | 0          |   0.0 |  0.00
Other   |            | 0.001094   |            |       |  2.59

Nlocal:          255.5 ave         258 max         253 min
Histogram: 1 0 0 0 0 0 0 0 0 1
Nghost:         1080.5 ave        1083 max        1078 min
Histogram: 1 0 0 0 0 0 0 0 0 1
Neighs:              0 ave           0 max           0 min
Histogram: 2 0 0 0 0 0 0 0 0 0
FullNghs:         8689 ave        8777 max        8601 min
Histogram: 1 0 0 0 0 0 0 0 0 1

Total # of neighbors = 17378
Ave neighs/atom = 34.007828
Neighbor list builds = 1
Dangerous builds = 0

reset_timestep  0

# only output atoms near vacancy

#dump events vacneigh custom 1000 dump.neb.sivac.$u id type x y z

fix             1 all neb 1.0

thermo          100

# run NEB for 2000 steps or to force tolerance

timestep        0.01
min_style       quickmin

neb             0.0 0.01 100 100 10 final final.sivac verbosity default
Per MPI rank memory allocation (min/avg/max) = 2.989 | 2.989 | 2.989 Mbytes
   Step          Temp          E_pair         E_mol          TotEng         Press     
         0   0             -2213.3343      0             -2213.3343     -3383.2606    
        63   0.00010153187 -2213.3374      0             -2213.3374     -3383.3953    
Loop time of 0.0424063 on 2 procs for 63 steps with 511 atoms

99.9% CPU use with 2 MPI tasks x 1 OpenMP threads

Minimization stats:
  Stopping criterion = force tolerance
  Energy initial, next-to-last, final = 
     -2213.33426537417  -2213.33736711116   -2213.3373695926
  Force two-norm initial, final = 0.16683659 0.0047744861
  Force max component initial, final = 0.014961353 0.00029110627
  Final line search alpha, max atom move = 0 0
  Iterations, force evaluations = 63 63

MPI task timing breakdown:
Section |  min time  |  avg time  |  max time  |%varavg| %total
---------------------------------------------------------------
Pair    | 0.019724   | 0.024372   | 0.029021   |   3.0 | 57.47
Neigh   | 0          | 0          | 0          |   0.0 |  0.00
Comm    | 0.0029144  | 0.0075694  | 0.012224   |   5.4 | 17.85
Output  | 0          | 0          | 0          |   0.0 |  0.00
Modify  | 0.0073798  | 0.007426   | 0.0074723  |   0.1 | 17.51
Other   |            | 0.003038   |            |       |  7.17

Nlocal:          255.5 ave         260 max         251 min
Histogram: 1 0 0 0 0 0 0 0 0 1
Nghost:         1080.5 ave        1085 max        1076 min
Histogram: 1 0 0 0 0 0 0 0 0 1
Neighs:              0 ave           0 max           0 min
Histogram: 2 0 0 0 0 0 0 0 0 0
FullNghs:         8682 ave        8837 max        8527 min
Histogram: 1 0 0 0 0 0 0 0 0 1

Total # of neighbors = 17364
Ave neighs/atom = 33.980431
Neighbor list builds = 0
Dangerous builds = 0
Per MPI rank memory allocation (min/avg/max) = 2.989 | 2.989 | 2.989 Mbytes
   Step          Temp          E_pair         E_mol          TotEng         Press     
        63   0             -2213.3374      0             -2213.3374     -3383.396     
        96   1.1818516e-06 -2213.3374      0             -2213.3374     -3383.3948    
Loop time of 0.0229079 on 2 procs for 33 steps with 511 atoms

91.6% CPU use with 2 MPI tasks x 1 OpenMP threads

Minimization stats:
  Stopping criterion = force tolerance
  Energy initial, next-to-last, final = 
      -2213.3373695926  -2213.33738750829    -2213.337387544
  Force two-norm initial, final = 0.0047744861 0.0013462454
  Force max component initial, final = 0.00029110627 7.8360749e-05
  Final line search alpha, max atom move = 0 0
  Iterations, force evaluations = 33 33

MPI task timing breakdown:
Section |  min time  |  avg time  |  max time  |%varavg| %total
---------------------------------------------------------------
Pair    | 0.010375   | 0.012454   | 0.014534   |   1.9 | 54.37
Neigh   | 0          | 0          | 0          |   0.0 |  0.00
Comm    | 0.0015189  | 0.0036039  | 0.0056889  |   3.5 | 15.73
Output  | 0          | 0          | 0          |   0.0 |  0.00
Modify  | 0.0049347  | 0.004958   | 0.0049814  |   0.0 | 21.64
Other   |            | 0.001891   |            |       |  8.26

Nlocal:          255.5 ave         260 max         251 min
Histogram: 1 0 0 0 0 0 0 0 0 1
Nghost:         1080.5 ave        1085 max        1076 min
Histogram: 1 0 0 0 0 0 0 0 0 1
Neighs:              0 ave           0 max           0 min
Histogram: 2 0 0 0 0 0 0 0 0 0
FullNghs:         8682 ave        8837 max        8527 min
Histogram: 1 0 0 0 0 0 0 0 0 1

Total # of neighbors = 17364
Ave neighs/atom = 33.980431
Neighbor list builds = 0
Dangerous builds = 0
Total wall time: 0:00:00

Now, from there since I found that adding the "verbosity" keyword to the "neb" command seems to fix one issue, I then add the "end" keyword to the "fix neb" command as:

fix             1 all neb 1.0 end last/efirst 1.0

thermo          100

# run NEB for 2000 steps or to force tolerance

timestep        0.01
min_style       quickmin

neb             0.0 0.01 100 100 10 final final.sivac verbosity default

and when run, I get this error printed to the screen:

LAMMPS (2 Aug 2023)
Running on 4 partitions of processors
Reading NEB coordinate file(s) ...
Setting up regular NEB ...
[jcappola-Precision-Tower-7910:07399] *** An error occurred in MPI_Bcast
[jcappola-Precision-Tower-7910:07399] *** reported by process [4080271361,6]
[jcappola-Precision-Tower-7910:07399] *** on communicator MPI_COMM_WORLD
[jcappola-Precision-Tower-7910:07399] *** MPI_ERR_COMM: invalid communicator
[jcappola-Precision-Tower-7910:07399] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[jcappola-Precision-Tower-7910:07399] ***    and potentially your MPI job)
[jcappola-Precision-Tower-7910:07389] 3 more processes have sent help message help-mpi-errors.txt / mpi_errors_are_fatal
[jcappola-Precision-Tower-7910:07389] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

Again, all the screen.* files are blank and the log.lammps.* files are identical to the ones shown previously (where the previous error occured).

I've pulled down your latest bug fixes to the "fix_neb.cpp" and recompiled an otherwise clean version of LAMMPS (2 Aug 2023) - Update 2 including the MANYBODY and REPLICA package. It seems to have fixed the issue with the "end" keyword when run with multiple cores per replica, but the lack of a "verbosity" keyword in the "neb" command still gives the same MPI_Allgather error.

Answer 3 · 2024-01-20T19:59:44.000Z

@jcappola thanks for the update. You can force unbuffered output (and thus avoid empty screen/log files) by adding the -nonbuf or short -nb command line flag. This was added specifically for debugging multi-replica jobs. 😉

I've identified an uninitialized data access in the neb.cpp code and refactored the flow of control a little bit to allow more code sharing (now that we require C++11 we can use a delegated constructor, which wasn't allowed when NEB was originally implemented).

The fix is added to the code. BTW, you do not have to manually apply the patch, but can just check out the "maintenance" branch (or download it as a snapshot from github) since these kinds of bugfixes are also backported and thus will show up in the third update to the stable version. Also see PR #4044

The verbosity bug does not show on my Linux box except when I explicitly test for uninitialized data with valgrind. By default allocated memory, and thus the variable triggering the output verbosity, are initialized to 0.

Answer 4 · 2024-01-21T21:34:59.000Z

@akohlmey,

Good to know about -nb!

I've checked out the maintenance branch and compiled. Both of the bugs I reported are now fixed, thank you! This issue can now be closed.

Answer 5 · 2024-01-21T23:29:30.000Z

@jcappola thanks for your feedback. While we do a lot of (automated) testing, there is nothing like bug reports from users to point out the oversights that are due to the many, many variants and way in how the different parts of LAMMPS can be used. While it is cool to have a software that can be used in many different ways like a set of Lego bricks and thus primarily limited by the creativity of its users, it also has the unique challenge that there are far too many permutations of combination of features and settings to test everything thoroughly. This is where people like you come in and which is what has made LAMMPS a much better and reliable software over the years. This has been particularly true since we moved the development process to GitHub and thus made it public. I hope cases like this keep you (and others that read this) motivated to keep reporting and suggesting.

The issue will be automatically closed once the pull request #4051 is merged into the develop branch.