hannorein/rebound

TRACE crashes after adding 2 particles after a merger event (on ARM-based Mac; [edit:] on Linux, too)

Closed this issue · 3 comments

Environment
Which version of REBOUND are you using and on what operating system?

  • REBOUND Version: 4.4.1
  • API interface: Python
  • Operating System (including version): MacOS 14.5

Describe the bug
When running rebound with the trace integrator on my Mac, the integration crashes after "a merger event followed by adding 2 particles". The error message is:

Python(92085,0x201658c00) malloc: *** error for object 0x6000012c27d0: pointer being realloc'd was not allocated
Python(92085,0x201658c00) malloc: *** set a breakpoint in malloc_error_break to debug

That said, when running it on Linux, the program finishes fine. @tigerchenlu98 suggested that it is still worth reporting.

To Reproduce
Here is a simple example:

import numpy as np
import rebound

rng = np.random.default_rng(seed=42)
def uniform_draw(_range):
    return rng.random() * (_range[1]-_range[0]) + _range[0]

def create_one_system():
    Np = 3
    qs = np.array([0.00001, 0.00001, 0.00001, ])
    Ps = np.array([4, 5.01, 6.02, ]) / 365.25 * (2*np.pi)
    es = np.array([0.05, 0.1, 0.05, ])
    rps = np.array([2, 2, 2, ])  # R_earth

    sim = rebound.Simulation()
    sim.add(m=1.)
    for idp in range(Np):
        sim.add(m=qs[idp], P=Ps[idp], e=es[idp], r=rps[idp]*4.2635e-5,
                M=uniform_draw([0, 2*np.pi]), omega=uniform_draw([-np.pi, np.pi]))
    sim.move_to_com()

    sim.collision = 'direct'
    sim.collision_resolve = 'merge'
    sim.integrator = "trace"
    sim.N_active = sim.N
    sim.testparticle_type = 1
    sim.dt = 0.0025
    return sim

sim = create_one_system()  # create an unstable system so merge happens fast
print(f"Initial state: N_active = {sim.N_active}, N_bodies = {sim.N}")
sim.integrate(100*2*np.pi)
print(f"t={sim.t/6.283:.3f}yr: N_active = {sim.N_active}, N_bodies = {sim.N}")

_coll_i = 2
ps = sim.particles
displacement = 100 * ps[_coll_i].r
v_esc = np.sqrt(2 * ps[_coll_i].m / ps[_coll_i].r)
# create a pair to maintain COM
phi = uniform_draw([-np.pi, np.pi])
for _phi in [phi, phi+np.pi]:
    sim.add(m = ps[_coll_i].m/10, 
            x = ps[_coll_i].x + displacement * np.cos(_phi), 
            y = ps[_coll_i].y + displacement * np.sin(_phi),
            vx = ps[_coll_i].vx + v_esc * np.cos(_phi),
            vy = ps[_coll_i].vy + v_esc * np.sin(_phi),
            z = 0, vz = 0, r = 0)
print(f"added two particles: N_active = {sim.N_active}, N_bodies = {sim.N}"+f"\nNow continue...")
sim.integrate(200*2*np.pi)
print(f"End @ t={sim.t/6.283:.3f}yr: N_active = {sim.N_active}, N_bodies = {sim.N}")

Output:

Initial state: N_active = 4, N_bodies = 4
t=100.003yr: N_active = 3, N_bodies = 3
added two particles: N_active = 3, N_bodies = 5
Now continue...
Python(92085,0x201658c00) malloc: *** error for object 0x60000184cc30: pointer being realloc'd was not allocated
Python(92085,0x201658c00) malloc: *** set a breakpoint in malloc_error_break to debug
[1]    92085 abort      python -u test.py

Additional context

The program runs fine on Linux. The output is

Initial state: N_active = 4, N_bodies = 4
t=100.003yr: N_active = 3, N_bodies = 3
added two particles: N_active = 3, N_bodies = 5
Now continue...
End @ t=200.006yr: N_active = 3, N_bodies = 3

Here is the lib info on the librebound on Mac in case that's useful:

~: otool -L /Users/me/venvs/asf/lib/python3.11/site-packages/librebound.cpython-311-darwin.so
/Users/me/venvs/asf/lib/python3.11/site-packages/librebound.cpython-311-darwin.so:
	@rpath/librebound.cpython-311-darwin.so (compatibility version 0.0.0, current version 0.0.0)
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1311.0.0)

and the lib info for the librebound on Linux:

~: ldd /lustre/me/venvs/asf/lib/python3.11/site-packages/librebound.cpython-311-x86_64-linux-gnu.so
        linux-vdso.so.1 (0x00007fffd03d8000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fa10f571000)
        libc.so.6 => /lib64/libc.so.6 (0x00007fa10f200000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fa10f62f000)

Thanks in advance.

Thanks for submitting @astroboylrx, I'll look into it

Since I recall seeing crashed on Linux as well. I did more testing.

Since some pointers are misued (as indicated in the testing on Mac), the program should run into issue on Linux as well, just a matter of time.

For the testing code above, if we add sim.status() in the end, rebound on Linux crashes as well, but the output may depend on the exact OS, compilers, etc., one example being:

Initial state: N_active = 4, N_bodies = 4
t=100.003yr: N_active = 3, N_bodies = 3
added two particles: N_active = 3, N_bodies = 5
Now continue...
End @ t=200.006yr: N_active = 3, N_bodies = 3
---------------------------------
REBOUND version:        4.4.1
REBOUND built on:       May  7 2024 19:22:50
Number of particles:    3
Selected integrator:    trace
Simulation time:        1.2566370614359173e+03
Current timestep:       0.002500
---------------------------------
<rebound.particle.Particle object at 0x7fde819adac0, m=1.0 x=-0.006960752601347207 y=-0.007419040562407109 z=0.0 vx=0.0001150676144799597 vy=3.78947000657919e-05 vz=0.0>
<rebound.particle.Particle object at 0x7fde819ae0f0, m=1e-05 x=-0.0472939720759477 y=0.024427087346706663 z=0.0 vx=-2.5428260380212966 vy=-3.490337000793281 vz=0.0>
<rebound.particle.Particle object at 0x7fde819adac0, m=2.4000000000000004e-05 x=-0.016680909412693956 y=0.049814973529017446 z=0.0 vx=-4.1965868691818065 vy=-0.6165110264754172 vz=0.0>
---------------------------------
The following fields have non-default values:
malloc(): unaligned tcache chunk detected
Aborted (core dumped)

That said, on some machine this test may still end fine.
However, if we add more particles:

# change this line
for _phi in [phi, phi+np.pi]:
# to
for _phi in [phi, phi+np.pi, phi/2, phi/2+np.pi]:
# also change the print statement to "added four particles"

then the program seems to crash definitely across different Linux machines. Some (same Linux machine as above) crashes when the integration continues:

Initial state: N_active = 4, N_bodies = 4
t=100.003yr: N_active = 3, N_bodies = 3
added four particles: N_active = 3, N_bodies = 7
Now continue...
free(): double free detected in tcache 2
Aborted (core dumped)

Some (a different Linux) crashes again at the status report:

Initial state: N_active = 4, N_bodies = 4
t=100.003yr: N_active = 3, N_bodies = 3
added two particles: N_active = 3, N_bodies = 7
Now continue...
End @ t=200.006yr: N_active = 3, N_bodies = 3
---------------------------------
REBOUND version:        4.4.1
REBOUND built on:       May 22 2024 17:52:13
Number of particles:    3
Selected integrator:    trace
Simulation time:        1.2566370614359173e+03
Current timestep:       0.002500
---------------------------------
<rebound.particle.Particle object at 0x7f37b9ccee40, m=1.0 x=-0.013926541815684102 y=-0.014841179374463744 z=0.0 vx=2.8370459765993097e-05 vy=0.00010303820536165573 vz=0.0>
<rebound.particle.Particle object at 0x7f37b9cce940, m=1.2e-05 x=-0.035736470653559 y=0.03289261994692842 z=0.0 vx=-3.9198249861702266 vy=-1.668698783945042 vz=0.0>
<rebound.particle.Particle object at 0x7f37b9ccee40, m=2.6000000000000005e-05 x=-0.0742794358857067 y=-0.016819926482250603 z=0.0 vx=-0.1342315943577592 vy=-4.100911119594687 vz=0.0>
---------------------------------
The following fields have non-default values:
[1]    237777 segmentation fault  python -u test.py

I've just pushed an update to the main branch which should fix this bug.