UCBerkeleySETI/turbo_seti

search_coarse_channel() causes find_events() to miss detecting events

Closed this issue · 3 comments

To Reproduce

Source Code

from shutil import rmtree
from os import mkdir
import logging
from turbo_seti.find_doppler.find_doppler import FindDoppler
from turbo_seti.find_event.find_event_pipeline import find_event_pipeline

H5DIR = '/seti_data/voyager_2020/'
OUTDIR = H5DIR + 'outdir/'
PATH_DAT_LIST_FILE = OUTDIR + 'new_dat_files.lst'
PATH_CSVF = OUTDIR + 'found_event_table.csv'

voyager_list = ['single_coarse_guppi_59046_80036_DIAG_VOYAGER-1_0011.rawspec.0000.h5',
                'single_coarse_guppi_59046_80354_DIAG_VOYAGER-1_0012.rawspec.0000.h5',
                'single_coarse_guppi_59046_80672_DIAG_VOYAGER-1_0013.rawspec.0000.h5',
                'single_coarse_guppi_59046_80989_DIAG_VOYAGER-1_0014.rawspec.0000.h5',
                'single_coarse_guppi_59046_81310_DIAG_VOYAGER-1_0015.rawspec.0000.h5',
                'single_coarse_guppi_59046_81628_DIAG_VOYAGER-1_0016.rawspec.0000.h5']

def make_dat_files():
    ii = 0
    with open(PATH_DAT_LIST_FILE, 'w') as file_handle:
        for filename in voyager_list:
            path_h5 = H5DIR + filename
            doppler = FindDoppler(path_h5,
                              max_drift = 4,
                              snr = 10,
                              log_level_int=logging.WARNING,
                              out_dir = H5DIR
                              )
            doppler.search()
            ii += 1
            path_dat = H5DIR + filename.replace('.h5', '.dat')
            file_handle.write('{}\n'.format(path_dat))
            print("make_dat_files: {} - finished making DAT file for {}".format(ii, path_h5))


def beginning():
    # Initialize output directory
    rmtree(OUTDIR, ignore_errors=True)
    mkdir(OUTDIR)
    
    # Make the DAT files
    make_dat_files()
    
#=================================================================

beginning()

# Generate CSV file from find_event_pipeline()
num_in_cadence = len(voyager_list)
find_event_pipeline(PATH_DAT_LIST_FILE,
                    filter_threshold = 3,
                    number_in_cadence = num_in_cadence,
                    user_validation=False,
                    saving=True,
                    csv_name=PATH_CSVF)
print("Produced {}".format(PATH_CSVF))

Console Output

make_dat_files: 1 - finished making DAT file for /seti_data/voyager_2020/single_coarse_guppi_59046_80036_DIAG_VOYAGER-1_0011.rawspec.0000.h5
make_dat_files: 2 - finished making DAT file for /seti_data/voyager_2020/single_coarse_guppi_59046_80354_DIAG_VOYAGER-1_0012.rawspec.0000.h5
make_dat_files: 3 - finished making DAT file for /seti_data/voyager_2020/single_coarse_guppi_59046_80672_DIAG_VOYAGER-1_0013.rawspec.0000.h5
make_dat_files: 4 - finished making DAT file for /seti_data/voyager_2020/single_coarse_guppi_59046_80989_DIAG_VOYAGER-1_0014.rawspec.0000.h5
make_dat_files: 5 - finished making DAT file for /seti_data/voyager_2020/single_coarse_guppi_59046_81310_DIAG_VOYAGER-1_0015.rawspec.0000.h5
make_dat_files: 6 - finished making DAT file for /seti_data/voyager_2020/single_coarse_guppi_59046_81628_DIAG_VOYAGER-1_0016.rawspec.0000.h5

************   BEGINNING FIND_EVENT PIPELINE   **************

Assuming the first observation is an ON
find_event_pipeline: source_name = VOYAGER-1
find_event_pipeline: source_name = VOYAGER-1
find_event_pipeline: source_name = VOYAGER-1
find_event_pipeline: source_name = VOYAGER-1
find_event_pipeline: source_name = VOYAGER-1
find_event_pipeline: source_name = VOYAGER-1
There are 6 total files in the filelist /seti_data/voyager_2020/outdir/new_dat_files.lst
therefore, looking for events in 1 on-off set(s)
with a minimum SNR of 10
Present in all A sources with RFI rejection from the off-sources
not including signals with zero drift
saving the output files

***       59046       ***

------   o   -------
Loading data...
Loaded 3 hits from /seti_data/voyager_2020/single_coarse_guppi_59046_80036_DIAG_VOYAGER-1_0011.rawspec.0000.dat (ON)
Loaded 0 hits from /seti_data/voyager_2020/single_coarse_guppi_59046_80354_DIAG_VOYAGER-1_0012.rawspec.0000.dat (OFF)
Loaded 3 hits from /seti_data/voyager_2020/single_coarse_guppi_59046_80672_DIAG_VOYAGER-1_0013.rawspec.0000.dat (ON)
Loaded 0 hits from /seti_data/voyager_2020/single_coarse_guppi_59046_80989_DIAG_VOYAGER-1_0014.rawspec.0000.dat (OFF)
Loaded 3 hits from /seti_data/voyager_2020/single_coarse_guppi_59046_81310_DIAG_VOYAGER-1_0015.rawspec.0000.dat (ON)
Loaded 0 hits from /seti_data/voyager_2020/single_coarse_guppi_59046_81628_DIAG_VOYAGER-1_0016.rawspec.0000.dat (OFF)
All data loaded!

Finding events in this cadence...
Found a total of 9 hits above the SNR cut in this cadence!
Length of off_table = 0
Found a total of 9 hits in only the on observations in this cadence!
NOTE: Found no events across this cadence :(
Search time: 0.04 sec
------   o   -------
Sorry, no potential candidates with your given parameters :(
*** find_event_output_dataframe is complete ***
Sorry, no events to save :(
Produced /seti_data/voyager_2020/outdir/found_event_table.csv

The issue is the "Sorry, no events to save" message .

To re-run just the find_event_pipeline()/find_events() portion after the DAT files have been produced once, just comment out the beginning() call.

Version 1.3.0 of turbo_seti succeeded in producing a CSV file of events at filter threshold 3.

When I create DAT files with 1.3.0, then both current (2.0.4.1) and 1.3.0 find_events() report filter threshold 3 correctly.
I was also surprised by the fact that the issue is in the DAT file creation itself - not in find_event as with issue #148 (crash).

Research now moves to find_doppler.py :: search_coarse_channels().

Attached is an M$ XLSX with a row-by-row top-hit comparison.
found_comparison.xlsx

Initial observations:

  • The abs(drift_rate) in 2.0.4.1 is ~1/3 smaller than in 1.3.0. That was probably a factor in the filter threshold test failing in 2.0.4.1.
  • The full number of hits in 2.0.4.1 are less than those of 1.3.0.
  • The other numbers are the same, including the pointer to where to find the original data (ChanIndx) so I am confident that this is an apples-to-apples comparison.

I think I found the problem. The fix (small) to #141 seems to have broken search_coarse_channel() calculation of drift rate boundaries.

I found a copy of 2.0.0 which does not yet have that fix. I had to apply the crash-guard for #148 in the find_event.py source code. Then, I could run all the way through to a CSV-generation which closely follows what 1.3.0 produces.
Updated comparison XSLX:

found_comparison.xlsx

So, now I need to find a new fix to @wfarah 's report of issue #141