tub-rip/events_h52bag

Code aborted/killed on h5 files with bigger sizes

Closed this issue · 3 comments

I tried the code with h5 file size of 264 MB and it worked perfectly fine. When I tried an h5 file with a size of 925 MB the code got killed. I think it is with respect to the input size of the h5 file. I have attached the error image.

image

Do you have any suggestions to break larger-sized h5 files into smaller-sized ones and manage the merging of the bags of individual chunks into one? Could you please share an example of that if you have done it already? Thanks in advance.

As you pointed out, I think you are running out of RAM when you load the full length (391111416 events) of all the datasets (events/t, p, x, y) of your 925MB h5 file. You need sufficient memory to be able to hold all of these datasets in uncompressed format simultaneously because of the way in which they are stored as events in a ROSbag.

To get over the memory limitation, you need to modify the readH5Datasets() function to read a subset of events from all datasets of the h5 file and produce a smaller ROSbag, during each execution. To read data partially from a h5 dataset, you need to select hyperslabs in the file dataspace. An example of doing that can be found here, specifically the following lines:

   /*
   * Define hyperslab in the dataset; implicitly giving strike and
   * block NULL.
   */
  hsize_t      offset[2];   // hyperslab offset in the file
  hsize_t      count[2];    // size of the hyperslab in the file
  offset[0] = 1;
  offset[1] = 2;
  count[0]  = NX_SUB;
  count[1]  = NY_SUB;
  dataspace.selectHyperslab( H5S_SELECT_SET, count, offset );

Let's say you want to load only 10,000,000 events at a time. You need to select a hyperslab in the file dataspace with the corresponding the values of offset and count. Each dataset is a 1-D array in our case. Then, you need to define memspace here with dimension 10,000,000 by doing dims_out[0] = 10000000. Finally, you also need to adjust the memory size of your output array here accordingly.

For more details about reading data partially using hyperslabs, you can check the tutorial here: https://support.hdfgroup.org/HDF5/Tutor/selectsimple.html.

Thanks, @ghoshsuman for the response.

New feature added with commit b07087e.
For large data files, multiple ROSBags are now generated by defining a limit for the number of events per bag. Please check updated README for usage instructions.