cnerg/RadClass

Numpy not saving all timestamp digits

stompsjo opened this issue · 3 comments

An integral part of RadClass is its ability to use the timestamps vector as a way of indexing data by keeping tracking of the working_time. These timestamps are epoch times in seconds. Therefore, they are a very large float. For example, I could have an epoch timestamp of 1.583047103120086E9 seconds. If I want to use the function np.where(self.processor.timestamps == self.working_time), I need almost all digits to be stored (at least the whole number of seconds). This is relevant for #8.

However, if I try to write a numpy array of timestamps to an HDF5 file (via h5py), not all digits are stored. For example, with the above timestamp, it would be stored into the .h5 file as 1.5830471E9, making it impossible to compare timestamps. This seems to be an underlying issue with the printing precision of numpy and I can fix this by saving each timestamp in the vector as a string using this numpy function but that changes downstream functions (dtype flip-flopping would be required by data i/o). Is there a different way to ensure all floats are saved in h5py properly.

Are you sure it's saved in the .h5 with that precision? HDF5 should use the full precision to store values. Could it be that you can't print it to screen/output with better precision? Do you have a code snippet that demonstrates this?

Yes, I've tested this by saving the numpy vector (which has the right precision) to an .h5 file and independently inspecting that file. The saved digits are less than expected (1.5830471E9 vs. 1.583047103120086E9 for example). #8 should be up-to-date with this issue. Here is an example snippet:


from tests.create_file import create_file
from RadClass.analysis import RadClass
import numpy as np
import time
from datetime import datetime, timedelta

filename = 'testfile.h5'
datapath = '/uppergroup/lowergroup/'
labels = {'live': '2x4x16LiveTimes',
            'timestamps': '2x4x16Times',
            'spectra': '2x4x16Spectra'}

start_date = datetime(2019, 2, 2)
#end_date = datetime(2019, 1, 2)
delta = timedelta(seconds=1)

timestamps = np.array([])
for i in range(1000):
    #timestamps = np.append(timestamps, np.format_float_scientific(start_date.timestamp(),precision=15))
    timestamps = np.append(timestamps, start_date.timestamp())
    start_date += delta
np.savetxt('timestamps.csv',timestamps, delimiter=',')
livetime = 0.9
live = np.full((len(timestamps),),0.9)
spectra = np.full((len(timestamps),1000),np.full((1,1000),10.0))

create_file(filename, datapath, labels, live, timestamps, spectra)

This can be run in the repo directory. Two files are created, one in .csv using numpy, the other a .h5 from h5py. The .csv contains the proper amount of digits but the .h5 does not. So there's something about h5py not saving all the digits to file from the numpy vector. Note that the commented line in the for loop is a way of forcing a specified precision but converts the vector to dtype string (and therefore the h5py dtype must also be updated. This is a workaround but then requires a lot of back and forth between string and float type that I'd rather avoid.

As far as I can tell, this issue is no longer occurring. test_integration is now passing in PR #8, which utilizes the create_file function. The fix seems to be specifying the dtype when saving arrays to a dataset in h5py. By specifying dtype='float64', all necessary digits seem to be saved. Apparently the dtype was being inferred incorrectly before.

Closing now.