Binary output

Question

Binary output

rikigigi opened this issue 4 years ago · 1 comments

@lorisercole
Right now, the default binary output is a pickle dumped blob that, for a first time user, I think it is difficult to understand. Its content is:

['KAPPA_SCALE',
 'TEMPERATURE',
 'TSKIP',
 'UNITS',
 'VOLUME',
 '__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 'cepstral_log',
 'j_DT_FS',
 'j_Nyquist_f_THz',
 'j_PSD_FILTER_W_THz',
 'j_cospectrum',
 'j_fcospectrum',
 'j_flogpsd',
 'j_fpsd',
 'j_freqs_THz',
 'j_logpsd',
 'j_psd',
 'jf_DT_FS',
 'jf_Nyquist_f_THz',
 'jf_dct_Kmin_corrfactor',
 'jf_dct_aic_Kmin',
 'jf_dct_kappa',
 'jf_dct_kappa_THEORY_std',
 'jf_dct_logpsd',
 'jf_dct_logpsdK',
 'jf_dct_logpsdK_THEORY_std',
 'jf_dct_logtau',
 'jf_dct_logtau_THEORY_std',
 'jf_dct_psd',
 'jf_flogpsd',
 'jf_fpsd',
 'jf_freqs_THz',
 'jf_logpsd',
 'jf_psd',
 'jf_resample_log',
 'kappa_Kmin',
 'kappa_Kmin_std',
 'units',
 'write_old_binary']

Is it used by anyone or anywhere in the code? Is it safe to change the default binary output to the one equivalent to the human readable one but with numpy arrays?

Answer 1 · 2020-10-02T13:58:46.000Z

The content of the default bin format is simply an object with those attributes.
However, I would also avoid splitting the binary output in many files: it does not make sense.

I think we can simplify this by saving many arrays/variables in a numpy or json file (we need to test this). Like this:

tc_dict = {
    'j': {
        'DT_FS': j.DT_FS,
        'KAPPA_SCALE': j.KAPPA_SCALE,
        'psd': j.psd,
         ...
    },
    'jf': {
        'DT_FS': j.DT_FS,
        'KAPPA_SCALE': j.KAPPA_SCALE,
        'psd': j.psd,
         ...
    },
    ...
}

Or with less-readable code:

tc_dict = {
    'j': {},
    'jf': {},
    ...
}
attrs_to_save = ['DT_FS', 'KAPPA_SCALE', 'psd', ...]
for key in tc_dict.keys():
    for attr in attrs_to_save:
        tc_dict[key][attr] = getattr(locals()[key], attr)

(we should find a smarter solution if the dictionary is more deeply-nested)

Then save it using numpy.save('binary_output.npy', **tc_dict) or json.dump(open('binary_output.json', 'w')).

We will then need functions to reconstruct the Currents objects, etc, from this binary file...

What do you think?