nasa/opera-sds-pcm

[New Feature]: DISP-S1 forward processing frame-date_range black list filter

Opened this issue · 7 comments

Checked for duplicates

Yes - I've already checked

Alternatives considered

Yes - and alternatives don't suffice

Related problems

No response

Describe the feature request

We want the ability to "black out" arbitrary list of sensing datetime ranges on a per frame basis. We envision a ADT-maintained json file that specifies a list of date ranges on per frame basis. We could generalize this file a bit and add possibility of adding other per-frame processing directives. Absence of "blackout_dates" indicate that processing should proceed normally as we have data available.

It would something like the following:

{
"831": {}
"832":  {"blackout_dates": [ {"start": ""2024-12-30T23:05:24", "end": ""2025-03-15T23:05:24"}. ...]},
...
"46543": {"blackout_dates": [ {"start": ""2024-11-15T23:05:24", "end": ""2025-04-30T23:05:24"}. ...]}
}

This blackout file is determined by settings.yaml field value like the following:

DISP_S1_BLACKOUT_DATES_S3PATH: "s3://opera-ancillaries/disp_frames/disp_s1_blackout_dates/sample_disp_s1_blackout.json"

this looks good to me, just to clarifications:

  1. would you want this to be the same file as the "consistent frame database" JSON? or separate?
  2. would the datetimes have to exactly line up with an acquisition sensing time? or just be used as a range?

@scottstanie

  1. While it's not a strong preference, I'd like it to be a separate file. That way we don't have to swap out "consistent frame database" file every time we want to change the black out dates.
  2. Any arbitrary date range would work. And we can have multiple date ranges per frame as well. When we determine that at least one burst falls within a blackout date range, that entire frame-sensing_datetime will be blacked out / ignored from processing

This is how I have currently coded the blackout dates file

{
  "831": {"blackout_dates": [ {"start": "2017-01-24T23:00:00", "end": "2017-08-24T23:00:00"}]},
  "832":  {"blackout_dates": [ {"start": "2017-01-24T23:00:00", "end": "2017-08-24T23:00:00"}, {"start": "2022-01-24T23:00:00", "end": "2022-08-24T23:00:00"}]},
  "11115":  {"blackout_dates": [ {"start": "2019-11-29T14:06:51", "end": "2020-02-27T14:07:30"}, {"start": "2020-11-29T14:07:39", "end": "2021-03-05T14:07:36"}]},
  "46543": {}
}

We could make it slightly less verbose by:

{
"blackout_dates: {
     "831": [ {"start": "2017-01-24T23:00:00", "end": "2017-08-24T23:00:00"}],
     "832":  [ {"start": "2017-01-24T23:00:00", "end": "2017-08-24T23:00:00"}, {"start": "2022-01-24T23:00:00", "end": "2022-08-24T23:00:00"}],
     "11115":  [ {"start": "2019-11-29T14:06:51", "end": "2020-02-27T14:07:30"}, {"start": "2020-11-29T14:07:39",  "end": "2021-03-05T14:07:36"}],
     "46543": []
}

Even less:

{
"blackout_dates: {
     "831": [ [ "2017-01-24T23:00:00", "2017-08-24T23:00:00"]],
     "832":  [ ["2017-01-24T23:00:00", "2017-08-24T23:00:00"], ["2022-01-24T23:00:00", "2022-08-24T23:00:00"]],
     "11115":  [ ["2019-11-29T14:06:51", "2020-02-27T14:07:30"], ["2020-11-29T14:07:39", "2021-03-05T14:07:36"]],
     "46543": []
}

@scottstanie @sjlewis-jpl @LucaCinquini Do you guys have a preference? In the end this file is generated and consumed by software but there might be some debugging implications.

The log output from CSLC query job when it skips a granules because it falls within a blackout date currently look like this. @sjlewis-jpl @LucaCinquini open to feedback here:

[2024-10-01 01:15:54,619: INFO/query_cmr] Skipping granule OPERA_L2_CSLC-S1_T042-088921-IW3_20200221T140735Z_20240430T094255Z_S1B_VV_v1.1 because frame_id=11115 falls on a blackout date blackout_start='2019-11-29T14:06:51Z' blackout_end='2020-02-27T14:07:30Z'
[2024-10-01 01:15:54,619: INFO/query_cmr] Skipping granule OPERA_L2_CSLC-S1_T042-088921-IW2_20200221T140734Z_20240430T094255Z_S1B_VV_v1.1 because frame_id=11115 falls on a blackout date blackout_start='2019-11-29T14:06:51Z' blackout_end='2020-02-27T14:07:30Z'
[2024-10-01 01:15:54,619: INFO/query_cmr] Skipping granule OPERA_L2_CSLC-S1_T042-088921-IW1_20200221T140733Z_20240430T094318Z_S1B_VV_v1.1 because frame_id=11115 falls on a blackout date blackout_start='2019-11-29T14:06:51Z' blackout_end='2020-02-27T14:07:30Z'

It's really up to you and Scott to decide - whatever works best for you guys.

I don't think I have a preference, though if this is really the only thing in the file, and it has something like "disp-s1-blackout-dates" in the filename, the least redundant would probably be

{
     "831": [ [ "2017-01-24T23:00:00", "2017-08-24T23:00:00"]],
     "832": ...

Ok this is what I propose then. Only blackout_dates field would be consumed by PCM. Everything else is for human context.

{
"creation_date": "2024-09-30",
"comments": "This file is used to black out winters in Alaska and northern North Dakota, as discussed in meeting on xxxx-xx-xx",
"blackout_dates: {
     "831": [ [ "2017-01-24T23:00:00", "2017-08-24T23:00:00"]],
     "832":  [ ["2017-01-24T23:00:00", "2017-08-24T23:00:00"], ["2022-01-24T23:00:00", "2022-08-24T23:00:00"]],
     "11115":  [ ["2019-11-29T14:06:51", "2020-02-27T14:07:30"], ["2020-11-29T14:07:39", "2021-03-05T14:07:36"]],
     "46543": []
}