UCBerkeleySETI/turbo_seti

Add frequency channel masking capability

Opened this issue · 11 comments

It would be great to add a flag to turboSETI to make it parse in a "frequency mask", a file with space-separated frequency channel numbers that the user wishes to ignore in the doppler search. These channels are frequencies where the user knows that radio frequency interference is present. The masked channels shall be replaced with 0 before performing doppler searching

cc: @telegraphic @siemion @luigifcruz

After a discussion with @wfarah , this is my understanding of the requirements for this enhancement issue:

  • It is desirable to be able to ignore (mask out) specified fine-channel frequencies (or ranges thereof) during a turboSETI executable run -or- during execution of FindDoppler.search() as called from another Python program.
  • Feasibility and possible approaches are TBD.
  • The fine-channels would be specified in a separate text editable file as a list of fine-channel mask specifications.
  • Each mask specification is the index of a single channel (nonnegative integer) or a range (low to high) of channel indexes.

Sample mask file:
# Ignore fine-channels 2, 4, 6, and 14 through 22.
2
4
6
14-22
# End of specifications

It is imagined that turboSETI and class FindDoppler would have new parameters:

  • A boolean indicating that masking is enabled or disabled (default: disabled)
  • The full path of the mask file when masking is enabled (default: none)

Parameter errors include:

  • Masking enabled but no mask file was specified.
  • Mask file not found or the file is an unreadable condition for whatever reason.
  • Syntax error in a specification inside the mask file (E.g. not a nonnegative integer).
  • A mask specification that is out of range of the channels as defined in the HDF5 header. This applies to a single fine-channel index or a range specificstion.
  • Two separate specifications overlap.

Any discussion points? Please comment.

Afterthought: Is it more desirable to express mask specifications in terms of frequency value? If so, maybe only range specifications make sense, similar to how blimpy-dice uses f_start and f_stop parameters to govern its behavior.

E.g.
# Ignore frequencies close to 6300, range 8100 to 8200, and range 9700 to 9900.
freq 8100 8200
freq 9700 9900
freq 6295 6305
# End of specifications

In the future, if need be, we can enhance specifications to use different criteria.

Assuming that the "afterthought" is the way to go (use frequency values instead of channel numbers), I think that I found a good place to throw out frequency ranges that are observed to be noisy in a previous turboSETI run.

Inside the turbo_seti data_handler.py class DATAHandle, there is a __split_h5() function which builds the coarse channel table for subsequent search() processing. Each entry has a starting frequency and ending frequency. It looks like a good place to mask. Just need to pass in parameters from FindDoppler.search() all the way.

Overlap cases: mask-low:mask-high vs f_start:f_stop
A coarse channel could be partially masked out.
Could just keep the wee bit that is not masked out.

Comments? @wfarah @telegraphic @siemion @luigifcruz

If this looks like it adds value to turbo_seti, then we need a more formal feasibility / system concept document to be reviewed.

I think this is a good idea @texadactyl, and my assumption is that the searching should not be affected by the "zero-ed" channels. As a matter of fact, after some discussion with Dave, some GBT data products have frequency channels that are 0s (because of the failure of some processing nodes). TurboSETI does not complain about them.

I think the "afterthought" is indeed the best way to go with this. In addition to that, we could also add a whitelist. It can be useful if the user is interested only in a certain frequency range. This can be achieved with a YAML configuration file. I'm thinking something like this:

whitelist:
    range:
        - start: 403e6
        - end: 405e6
    # and/or
    point:
        - center_freq: 1.705e9
        - bandwidth: 2e6
# and/or
blacklist:
    range:
        - start: 88e6
        - end: 107e6
    range:
        - start: 2.3e9
        - end: 2.9e9

Sorry for the delay!

Folks, any updates on this?

@wfarah Yes, we have an implementation plan and should start development later today.

Perfect, thanks @luigifcruz!

Going through the open issues today so chiming in: this is a great idea and would love to see it 👍

@wfarah

@telegraphic and I have not forgotten this feature request. I just added it to hyperseti's list too.

Note that there are 2 pinned issues in turbo_seti:

  1. #231 - significant design flaw <--- Done!
  2. #125 - highly useful enhancement

That is the priority list order for turbo_seti. If hyperseti can address the 2nd issue before someone in turbo_seti can get to it, then it probably will not be done in turbo_seti (guessing).

For whenever this feature is revived:

  • Sample files.
  • What are the diagnostics when there are errors in the yaml file?