hannorein/rebound

How to read the binary file output?

kbryant25 opened this issue · 8 comments

Environment

  • REBOUND Version: 4.4.1
  • API interface: Python 3.11
  • Operating System: Windows 10

Physics
I am trying to simulate a supermassive black hole recoil. Right now I'm working with 1000 type 0 test particles (representing gas clouds around the SMBH) and a central mass of 10**6 solar masses. I am using the IAS15 integrator and the default code units. After initializing the simulation and adding the test particles in a random distribution around the central mass I am giving the central mass a kick velocity of 400, again in code units. Right now I'm unsure what the full time scale for my simulation will be but my question is regarding something else.

Goal
My goal is to create this simulation and then create emission line spectra based on the position and velocity vectors for the full simulation output by the code. I'm guessing the best way to get this data is by using the binary file output function that is built into rebound. I know I will need to append the binary file and add the data for each timestep first and then output the entire file so I can use it for analysis. I know rebound is able to easily read and interpret the binary files but I will need to use something other than rebound for this and in order to create a code to interpret that data I need to know how the files are formatted/ how to read them. With my relatively limited code knowledge my best guess for how to do this would be to feed the analysis code a TSV/CSV file but that requires converting the binary file into a TSV or CSV first.

Progress
I've found a program that is able to translate the binary file into various other formats but when I translate it to ASCII or hexadecimal this is what I see:

image

I have no idea how to interpret this data or if I'm even using the correct format. My question is, how do I translate the binary file into something a human can read/ how do I tell the computer how to read this file correctly? Or, is there a better way to do this that I just haven't learned yet?

Just my 2 cents as a community member and (relatively) long time rebound user. I'd suggest parsing the binary files (simulationArchives) in python (presumably some place like a notebook). See the simulationArchive documentation for info on that. Now that you have access to all of the simulation data, consider what outputs you'd need to accomplish your goal (particle state vectors, orbital elements) and save those into some other python data structure like numpy arrays or a pandas dataframe and then save that reduced data into the csv that you're looking for.

This could look something as simple as:

import rebound
import numpy as np

sa = rebound.Simulationarchive("archive.bin")
N_archives = len(sa) # number of saves
# Initialize data arrays
save_times = np.zeros(N_archives)
particles_x = np.zeros((N_archives, 1000)) # 2 dimensional  array of size N saves and M particles (1000 in this case)
particles_y = np.zeros((N_archives, 1000))
... # so on and so forth
particles_vz = np.zeros((N_archives, 1000))

# Now loop through the saves and pull out all of the values of interest
for i, sim in enumerate(sa):
    save_time[i] = sim.t
    for j in range(1, sim.N):
        particles_x[i, j] = sim.particles[j].x
        particles_y[i, j] = sim.particles[j].y
        ... # and so on
        particles_vz[i, j] = sim.particles[j].vx

# finally save your data into the preferred format
all_data = np.concatenate([[save_times], particles_x, particles_y, ..., particles_vz])
all_data.savetxt('path/to/save.csv', delimiter=', ', header="# times, 1000 particles x, 1000 particles y, ... 1000 particles vz, this is optional")

Granted the above solution for organizing and saving the data is not the prettiest... I doubt you want all thousand particles x values followed by all thousand particles y values and so on in a row. You'd have to do a bit more fancy data organizing to get each particles state vector (x,y,z,vx,vy,vz) cycling one after another but that's the beauty of playing in notebooks where you can easily tweak your code.

Hope this helps!

This seems really promising thank you so much! I'll let you know if I have any further questions but this is a great starting point I really appreciate it

@Rmelikyan already recommends what I would recommend too. Use REBOUND to read simulation archive files. It's a custom file format so you can't easily read it in with other software. Note that if you're sure you just want to store the positions of all particles at regular intervals, then you can use your own output routine with a binary or text format of your own choice. The simulation archive on the other hand includes everything, including all particle data and the state of the simulation itself. With that you can for example stop the simulation and continue running it later very easily. I think it's good advice to start with the simulation archive because you don't need to know a priori what outputs you need later. If you use your own custom format and output the positions, but later notice you also need the semi-major axis, then you'll need to rerun everything. In @Rmelikyan's code all you'd need to do is change sim.particles[j].x to sim.particles[j].a.

Now, if you're really keen to understand the simulation archive format itself, it is documented here: https://rebound.readthedocs.io/en/latest/binaryformat/

Everything seems to be working fine except for one issue at the very end. I've followed the code structure @Rmelikyan replied with but it's giving an error with the concatenate function. It gives the following error:

ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 2 dimension(s) and the array at index 1 has 3 dimension(s)

With this line in the code:

all_data = np.concatenate([[save_times], [particles_x, particles_y, particles_z, particles_vx, particles_vy, particles_vz]])

In this case I know I am trying to save 201 snapshots of the position and velocity data for 1001 particles (1 massive particle and 1000 test particles) but I don't understand why it thinks time has 2 dimensions. I'm unfamiliar with the concatenate function because I've never used it before and after looking at online documentation for it I still can't figure out why the error is occurring.

save_times has two dimensions in this case because of the extra brackets wrapped around it. I think if you remove the brackets around the particles_x,y,z... it should work. Try:
all_data = np.concatenate([[save_times], particles_x, particles_y, particles_z, particles_vx, particles_vy, particles_vz])
numpys concatenate function can be tricky which is why I didn't go into much detail before, but basically you need to pass the function a list of arrays that are all similar dimensions (which is why you need to make time 2d rather than 1d with the extra brackets). Let me know if this works.

It gives me a new error on the same line:

ValueError: all the input array dimensions except for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 201 and the array at index 1 has size 1001

Like I said, playing around with numpy broadcasting can be a headache. Try this:
np.concatenate([[save_times], particles_x.T, particles_y.T, particles_z.T, particles_vx.T, particles_vy.T, particles_vz.T]) and if that still doesn't work I'd start by printing the shapes (e.g. print(particles_x.shape)) of each of my data arrays and then looking into numpy broadcasting to try to figure out how to make your data all fit together.

I tried your fix and it was still giving an error but replacing all_data.savetext(...) with np.savetext(path, all_data, delimiter=', ', header="# times, x, y, z, vx, vy, vz") fixed the issue. The csv file is a bit of a mess and I can't exactly tell how it's organized but it seems to be saving all of the data which is what is important. Thank you so much for your help I really appreciate it!