search_coarse_channel() causes find_events() to miss detecting events
Closed this issue · 3 comments
To Reproduce
- From http://blpd14.ssl.berkeley.edu/voyager_2020/single_coarse_channel/, download only the *.0000.h5 files.
- Run the Python program below.
Source Code
from shutil import rmtree
from os import mkdir
import logging
from turbo_seti.find_doppler.find_doppler import FindDoppler
from turbo_seti.find_event.find_event_pipeline import find_event_pipeline
H5DIR = '/seti_data/voyager_2020/'
OUTDIR = H5DIR + 'outdir/'
PATH_DAT_LIST_FILE = OUTDIR + 'new_dat_files.lst'
PATH_CSVF = OUTDIR + 'found_event_table.csv'
voyager_list = ['single_coarse_guppi_59046_80036_DIAG_VOYAGER-1_0011.rawspec.0000.h5',
'single_coarse_guppi_59046_80354_DIAG_VOYAGER-1_0012.rawspec.0000.h5',
'single_coarse_guppi_59046_80672_DIAG_VOYAGER-1_0013.rawspec.0000.h5',
'single_coarse_guppi_59046_80989_DIAG_VOYAGER-1_0014.rawspec.0000.h5',
'single_coarse_guppi_59046_81310_DIAG_VOYAGER-1_0015.rawspec.0000.h5',
'single_coarse_guppi_59046_81628_DIAG_VOYAGER-1_0016.rawspec.0000.h5']
def make_dat_files():
ii = 0
with open(PATH_DAT_LIST_FILE, 'w') as file_handle:
for filename in voyager_list:
path_h5 = H5DIR + filename
doppler = FindDoppler(path_h5,
max_drift = 4,
snr = 10,
log_level_int=logging.WARNING,
out_dir = H5DIR
)
doppler.search()
ii += 1
path_dat = H5DIR + filename.replace('.h5', '.dat')
file_handle.write('{}\n'.format(path_dat))
print("make_dat_files: {} - finished making DAT file for {}".format(ii, path_h5))
def beginning():
# Initialize output directory
rmtree(OUTDIR, ignore_errors=True)
mkdir(OUTDIR)
# Make the DAT files
make_dat_files()
#=================================================================
beginning()
# Generate CSV file from find_event_pipeline()
num_in_cadence = len(voyager_list)
find_event_pipeline(PATH_DAT_LIST_FILE,
filter_threshold = 3,
number_in_cadence = num_in_cadence,
user_validation=False,
saving=True,
csv_name=PATH_CSVF)
print("Produced {}".format(PATH_CSVF))
Console Output
make_dat_files: 1 - finished making DAT file for /seti_data/voyager_2020/single_coarse_guppi_59046_80036_DIAG_VOYAGER-1_0011.rawspec.0000.h5
make_dat_files: 2 - finished making DAT file for /seti_data/voyager_2020/single_coarse_guppi_59046_80354_DIAG_VOYAGER-1_0012.rawspec.0000.h5
make_dat_files: 3 - finished making DAT file for /seti_data/voyager_2020/single_coarse_guppi_59046_80672_DIAG_VOYAGER-1_0013.rawspec.0000.h5
make_dat_files: 4 - finished making DAT file for /seti_data/voyager_2020/single_coarse_guppi_59046_80989_DIAG_VOYAGER-1_0014.rawspec.0000.h5
make_dat_files: 5 - finished making DAT file for /seti_data/voyager_2020/single_coarse_guppi_59046_81310_DIAG_VOYAGER-1_0015.rawspec.0000.h5
make_dat_files: 6 - finished making DAT file for /seti_data/voyager_2020/single_coarse_guppi_59046_81628_DIAG_VOYAGER-1_0016.rawspec.0000.h5
************ BEGINNING FIND_EVENT PIPELINE **************
Assuming the first observation is an ON
find_event_pipeline: source_name = VOYAGER-1
find_event_pipeline: source_name = VOYAGER-1
find_event_pipeline: source_name = VOYAGER-1
find_event_pipeline: source_name = VOYAGER-1
find_event_pipeline: source_name = VOYAGER-1
find_event_pipeline: source_name = VOYAGER-1
There are 6 total files in the filelist /seti_data/voyager_2020/outdir/new_dat_files.lst
therefore, looking for events in 1 on-off set(s)
with a minimum SNR of 10
Present in all A sources with RFI rejection from the off-sources
not including signals with zero drift
saving the output files
*** 59046 ***
------ o -------
Loading data...
Loaded 3 hits from /seti_data/voyager_2020/single_coarse_guppi_59046_80036_DIAG_VOYAGER-1_0011.rawspec.0000.dat (ON)
Loaded 0 hits from /seti_data/voyager_2020/single_coarse_guppi_59046_80354_DIAG_VOYAGER-1_0012.rawspec.0000.dat (OFF)
Loaded 3 hits from /seti_data/voyager_2020/single_coarse_guppi_59046_80672_DIAG_VOYAGER-1_0013.rawspec.0000.dat (ON)
Loaded 0 hits from /seti_data/voyager_2020/single_coarse_guppi_59046_80989_DIAG_VOYAGER-1_0014.rawspec.0000.dat (OFF)
Loaded 3 hits from /seti_data/voyager_2020/single_coarse_guppi_59046_81310_DIAG_VOYAGER-1_0015.rawspec.0000.dat (ON)
Loaded 0 hits from /seti_data/voyager_2020/single_coarse_guppi_59046_81628_DIAG_VOYAGER-1_0016.rawspec.0000.dat (OFF)
All data loaded!
Finding events in this cadence...
Found a total of 9 hits above the SNR cut in this cadence!
Length of off_table = 0
Found a total of 9 hits in only the on observations in this cadence!
NOTE: Found no events across this cadence :(
Search time: 0.04 sec
------ o -------
Sorry, no potential candidates with your given parameters :(
*** find_event_output_dataframe is complete ***
Sorry, no events to save :(
Produced /seti_data/voyager_2020/outdir/found_event_table.csv
The issue is the "Sorry, no events to save" message .
To re-run just the find_event_pipeline()/find_events() portion after the DAT files have been produced once, just comment out the beginning()
call.
Version 1.3.0 of turbo_seti succeeded in producing a CSV file of events at filter threshold 3.
When I create DAT files with 1.3.0, then both current (2.0.4.1) and 1.3.0 find_events() report filter threshold 3 correctly.
I was also surprised by the fact that the issue is in the DAT file creation itself - not in find_event as with issue #148 (crash).
Research now moves to find_doppler.py :: search_coarse_channels().
Attached is an M$ XLSX with a row-by-row top-hit comparison.
found_comparison.xlsx
Initial observations:
- The abs(drift_rate) in 2.0.4.1 is ~1/3 smaller than in 1.3.0. That was probably a factor in the filter threshold test failing in 2.0.4.1.
- The full number of hits in 2.0.4.1 are less than those of 1.3.0.
- The other numbers are the same, including the pointer to where to find the original data (ChanIndx) so I am confident that this is an apples-to-apples comparison.
I think I found the problem. The fix (small) to #141 seems to have broken search_coarse_channel() calculation of drift rate boundaries.
I found a copy of 2.0.0 which does not yet have that fix. I had to apply the crash-guard for #148 in the find_event.py source code. Then, I could run all the way through to a CSV-generation which closely follows what 1.3.0 produces.
Updated comparison XSLX:
So, now I need to find a new fix to @wfarah 's report of issue #141