search_coarse_channel() causes find_events() to miss detecting events

Question

search_coarse_channel() causes find_events() to miss detecting events

Closed this issue 4 years ago · 3 comments

To Reproduce

From http://blpd14.ssl.berkeley.edu/voyager_2020/single_coarse_channel/, download only the *.0000.h5 files.
Run the Python program below.

Source Code

from shutil import rmtree
from os import mkdir
import logging
from turbo_seti.find_doppler.find_doppler import FindDoppler
from turbo_seti.find_event.find_event_pipeline import find_event_pipeline

H5DIR = '/seti_data/voyager_2020/'
OUTDIR = H5DIR + 'outdir/'
PATH_DAT_LIST_FILE = OUTDIR + 'new_dat_files.lst'
PATH_CSVF = OUTDIR + 'found_event_table.csv'

voyager_list = ['single_coarse_guppi_59046_80036_DIAG_VOYAGER-1_0011.rawspec.0000.h5',
                'single_coarse_guppi_59046_80354_DIAG_VOYAGER-1_0012.rawspec.0000.h5',
                'single_coarse_guppi_59046_80672_DIAG_VOYAGER-1_0013.rawspec.0000.h5',
                'single_coarse_guppi_59046_80989_DIAG_VOYAGER-1_0014.rawspec.0000.h5',
                'single_coarse_guppi_59046_81310_DIAG_VOYAGER-1_0015.rawspec.0000.h5',
                'single_coarse_guppi_59046_81628_DIAG_VOYAGER-1_0016.rawspec.0000.h5']

def make_dat_files():
    ii = 0
    with open(PATH_DAT_LIST_FILE, 'w') as file_handle:
        for filename in voyager_list:
            path_h5 = H5DIR + filename
            doppler = FindDoppler(path_h5,
                              max_drift = 4,
                              snr = 10,
                              log_level_int=logging.WARNING,
                              out_dir = H5DIR
                              )
            doppler.search()
            ii += 1
            path_dat = H5DIR + filename.replace('.h5', '.dat')
            file_handle.write('{}\n'.format(path_dat))
            print("make_dat_files: {} - finished making DAT file for {}".format(ii, path_h5))


def beginning():
    # Initialize output directory
    rmtree(OUTDIR, ignore_errors=True)
    mkdir(OUTDIR)
    
    # Make the DAT files
    make_dat_files()
    
#=================================================================

beginning()

# Generate CSV file from find_event_pipeline()
num_in_cadence = len(voyager_list)
find_event_pipeline(PATH_DAT_LIST_FILE,
                    filter_threshold = 3,
                    number_in_cadence = num_in_cadence,
                    user_validation=False,
                    saving=True,
                    csv_name=PATH_CSVF)
print("Produced {}".format(PATH_CSVF))

Console Output

make_dat_files: 1 - finished making DAT file for /seti_data/voyager_2020/single_coarse_guppi_59046_80036_DIAG_VOYAGER-1_0011.rawspec.0000.h5
make_dat_files: 2 - finished making DAT file for /seti_data/voyager_2020/single_coarse_guppi_59046_80354_DIAG_VOYAGER-1_0012.rawspec.0000.h5
make_dat_files: 3 - finished making DAT file for /seti_data/voyager_2020/single_coarse_guppi_59046_80672_DIAG_VOYAGER-1_0013.rawspec.0000.h5
make_dat_files: 4 - finished making DAT file for /seti_data/voyager_2020/single_coarse_guppi_59046_80989_DIAG_VOYAGER-1_0014.rawspec.0000.h5
make_dat_files: 5 - finished making DAT file for /seti_data/voyager_2020/single_coarse_guppi_59046_81310_DIAG_VOYAGER-1_0015.rawspec.0000.h5
make_dat_files: 6 - finished making DAT file for /seti_data/voyager_2020/single_coarse_guppi_59046_81628_DIAG_VOYAGER-1_0016.rawspec.0000.h5

************   BEGINNING FIND_EVENT PIPELINE   **************

Assuming the first observation is an ON
find_event_pipeline: source_name = VOYAGER-1
find_event_pipeline: source_name = VOYAGER-1
find_event_pipeline: source_name = VOYAGER-1
find_event_pipeline: source_name = VOYAGER-1
find_event_pipeline: source_name = VOYAGER-1
find_event_pipeline: source_name = VOYAGER-1
There are 6 total files in the filelist /seti_data/voyager_2020/outdir/new_dat_files.lst
therefore, looking for events in 1 on-off set(s)
with a minimum SNR of 10
Present in all A sources with RFI rejection from the off-sources
not including signals with zero drift
saving the output files

***       59046       ***

------   o   -------
Loading data...
Loaded 3 hits from /seti_data/voyager_2020/single_coarse_guppi_59046_80036_DIAG_VOYAGER-1_0011.rawspec.0000.dat (ON)
Loaded 0 hits from /seti_data/voyager_2020/single_coarse_guppi_59046_80354_DIAG_VOYAGER-1_0012.rawspec.0000.dat (OFF)
Loaded 3 hits from /seti_data/voyager_2020/single_coarse_guppi_59046_80672_DIAG_VOYAGER-1_0013.rawspec.0000.dat (ON)
Loaded 0 hits from /seti_data/voyager_2020/single_coarse_guppi_59046_80989_DIAG_VOYAGER-1_0014.rawspec.0000.dat (OFF)
Loaded 3 hits from /seti_data/voyager_2020/single_coarse_guppi_59046_81310_DIAG_VOYAGER-1_0015.rawspec.0000.dat (ON)
Loaded 0 hits from /seti_data/voyager_2020/single_coarse_guppi_59046_81628_DIAG_VOYAGER-1_0016.rawspec.0000.dat (OFF)
All data loaded!

Finding events in this cadence...
Found a total of 9 hits above the SNR cut in this cadence!
Length of off_table = 0
Found a total of 9 hits in only the on observations in this cadence!
NOTE: Found no events across this cadence :(
Search time: 0.04 sec
------   o   -------
Sorry, no potential candidates with your given parameters :(
*** find_event_output_dataframe is complete ***
Sorry, no events to save :(
Produced /seti_data/voyager_2020/outdir/found_event_table.csv

The issue is the "Sorry, no events to save" message .

To re-run just the find_event_pipeline()/find_events() portion after the DAT files have been produced once, just comment out the beginning() call.

Answer 1 · 2021-01-02T15:24:29.000Z

Version 1.3.0 of turbo_seti succeeded in producing a CSV file of events at filter threshold 3.

When I create DAT files with 1.3.0, then both current (2.0.4.1) and 1.3.0 find_events() report filter threshold 3 correctly.
I was also surprised by the fact that the issue is in the DAT file creation itself - not in find_event as with issue #148 (crash).

Research now moves to find_doppler.py :: search_coarse_channels().

Answer 2 · 2021-01-02T19:26:55.000Z

Attached is an M$ XLSX with a row-by-row top-hit comparison.
found_comparison.xlsx

Initial observations:

The abs(drift_rate) in 2.0.4.1 is ~1/3 smaller than in 1.3.0. That was probably a factor in the filter threshold test failing in 2.0.4.1.
The full number of hits in 2.0.4.1 are less than those of 1.3.0.
The other numbers are the same, including the pointer to where to find the original data (ChanIndx) so I am confident that this is an apples-to-apples comparison.

Answer 3 · 2021-01-02T20:20:33.000Z

I think I found the problem. The fix (small) to #141 seems to have broken search_coarse_channel() calculation of drift rate boundaries.

I found a copy of 2.0.0 which does not yet have that fix. I had to apply the crash-guard for #148 in the find_event.py source code. Then, I could run all the way through to a CSV-generation which closely follows what 1.3.0 produces.
Updated comparison XSLX:

found_comparison.xlsx

So, now I need to find a new fix to @wfarah 's report of issue #141