AxFoundation/strax

dtype changes when returning records

Closed this issue · 7 comments

Describe the bug
Using any time range parameter on records leads to a bug in which the records dtype changes to raw_records dtype. I have no idea how this could happen.

To Reproduce

rec1 = st.get_array(any_run_id, 'records')
print(rec1.dtype.names)
rec2 = st.get_array(any_run_id, 'records', seconds_range=(0, 1))
print(rec2.dtype.names)

Btw the behavior does not show up if we are doing:

rec1 = st.get_array(any_run_id, 'records', _chunk_number=0)
print(rec1.dtype.names)

instead. Hence I think processing is not affected. We can also load a single chunk and check:

gen = st.get_iter('any_run_id', 'records', seconds_range=(0, 1))
chunk = next(gen)
print(chunk.data.dtype)
print(chunk.dtype)

which gives different data_types.

I cannot reproduce (on two different hosts). Could you include your run_id and installations?

(strax) joran@DESKTOP-F4PI41P:/mnt/d$ python -c "import straxen; straxen.print_versions() ; st = straxen.contexts.xenonnt_online() ; any_run_id = '009104' ; rec1 = st.get_array(any_run_id, 'records') ; print(rec1.dtype.names) ; rec2 = st.get_array(any_run_id, 'records', seconds_range=(0, 1)) ; print(rec2.dtype.names); print('bye bye')"
/mnt/d/Google_Drive/PhD-master/ubuntu-storage/ubuntu-windows/software/straxen/straxen/rucio.py:29: UserWarning: No installation of rucio-clients found. Can't use rucio remote backend.
  warnings.warn("No installation of rucio-clients found. Can't use rucio remote backend.")
Working on DESKTOP-F4PI41P.localdomain with the following versions and installation paths:
python  v3.8.5  (default, Sep  4 2020, 07:30:14) [GCC 7.3.0]
strax   v0.16.1 /mnt/d/Google_Drive/PhD-master/ubuntu-storage/ubuntu-windows/software/strax/strax
straxen v0.19.3 /mnt/d/Google_Drive/PhD-master/ubuntu-storage/ubuntu-windows/software/straxen/straxen
cutax   v0.1.1  /mnt/d/Google_Drive/PhD-master/ubuntu-storage/ubuntu-windows/software/cutax/cutax
Loading records: |██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100.00 % [00:08<00:00], #3 (1.61 s). 261.2 MB/s
('time', 'length', 'dt', 'channel', 'pulse_length', 'record_i', 'area', 'reduction_level', 'baseline', 'baseline_rms', 'amplitude_bit_shift', 'data')
Loading records: |████████████▎                                                                                                                                                                | 7.14 % [00:04<00:53], #1 (4.11 s). 39.5 MB/s
('time', 'length', 'dt', 'channel', 'pulse_length', 'record_i', 'area', 'reduction_level', 'baseline', 'baseline_rms', 'amplitude_bit_shift', 'data')
bye bye

(strax) [angevaare@dali-login1 favorite_run]$ python -c "import straxen; straxen.print_versions() ; st = straxen.contexts.xenonnt_online() ; any_run_id = '009104' ; rec1 = st.get_array(any_run_id, 'records') ; print(rec1.dtype.names) ; rec2 = st.get_array(any_run_id, 'records', seconds_range=(0, 1)) ; print(rec2.dtype.names); print('bye bye')"
/home/angevaare/software/straxen/straxen/rucio.py:29: UserWarning: No installation of rucio-clients found. Can't use rucio remote backend.
  warnings.warn("No installation of rucio-clients found. Can't use rucio remote backend.")
Working on dali-login1.rcc.local with the following versions and installation paths:
python  v3.8.5  (default, Sep  4 2020, 07:30:14) [GCC 7.3.0]
strax   v0.16.1 /home/angevaare/software/strax/strax
straxen v0.19.3 /home/angevaare/software/straxen/straxen
cutax   v0.1.1  /home/angevaare/software/dev_strax/cutax/cutax
Loading records: |███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100.00 % [00:31<00:00], #3 (4.64 s). 87.0 MB/s
('time', 'length', 'dt', 'channel', 'pulse_length', 'record_i', 'area', 'reduction_level', 'baseline', 'baseline_rms', 'amplitude_bit_shift', 'data')
Loading records: |████████████▎                                                                                                                                                                | 7.14 % [00:01<00:24], #1 (1.92 s). 84.8 MB/s
('time', 'length', 'dt', 'channel', 'pulse_length', 'record_i', 'area', 'reduction_level', 'baseline', 'baseline_rms', 'amplitude_bit_shift', 'data')
bye bye

Funny neither can I. But I am happy that it did not show up again.

While running our test suite I hit by chance again the same issue. But this time time range was not involved.

I was running the tests of the latest master (06.08.2021).

 def test_several():
        """
        Test several other functions in straxen. Is kind of messy but saves
        time as we won't load data many times
        :return:
        """
        with tempfile.TemporaryDirectory() as temp_dir:
            try:
                print("Temporary directory is ", temp_dir)
                os.chdir(temp_dir)

                print("Downloading test data (if needed)")
                st = straxen.contexts.demo()
>               st.make(test_run_id_1T, 'records')

End of the traceback:

    strax.copy_to_buffer(raw_records, records, '_copy_raw_records')
/home/dwenz/mymodules/strax/strax/dtypes.py:248: in copy_to_buffer
    globals()[func_name](source, buffer)
/home/dwenz/scratch-midway2/programs/anaconda3/envs/strax_main_py38/lib/python3.8/site-packages/numba/core/dispatcher.py:414: in _compile_for_args
    error_rewrite(e, 'typing')
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

e = TypingError('Failed in nopython mode pipeline (step: nopython frontend)\nInternal error at <numba.core.typeinfer.Stati... (14)\nEnable logging at debug level for details.\n\nFile "<string>", line 14:\n<source missing, REPL/exec in use?>\n')
issue_type = 'typing'

    def error_rewrite(e, issue_type):
        """
        Rewrite and raise Exception `e` with help supplied based on the
        specified issue_type.
        """
        if config.SHOW_HELP:
            help_msg = errors.error_extras[issue_type]
            e.patch_message('\n'.join((str(e).rstrip(), help_msg)))
        if config.FULL_TRACEBACKS:
            raise e
        else:
>           raise e.with_traceback(None)
E           numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
E           Internal error at <numba.core.typeinfer.StaticGetItemConstraint object at 0x7f1a884e2100>.
E           "Field 'baseline' was not found in record with fields ('Start time since unix epoch [ns]', 'time', 'Length of the interval in samples', 'length', 'Width of one sample [ns]', 'dt', 'Channel/PMT number', 'channel', 'Length of pulse to which the record belongs (without zero-padding)', 'pulse_length', 'Fragment number in the pulse', 'record_i', 'Waveform data in raw ADC counts', 'data')"
E           During: typing of static-get-item at <string> (14)
E           Enable logging at debug level for details.

@WenzDaniel , this seems to be something related to your installation. I can not reproduce and also for testing, we cannot reproduce otherwise github actions would be failing many times.

Perhaps checkout some of your __pycache__ and/or software installation?

Since this only can somehow reproduced in your env (it seems), I think there is little we can do.

Clearing the cached helped.

I can not reproduce and also for testing, we cannot reproduce otherwise github actions would be failing many times.

But my tests did not fail too. So not sure if this just pops up by chance.

But I agree as long as we cannot reproduce there is not much we can do.