[New Feature]: DISP-S1 forward processing frame-date_range black list filter
Opened this issue · 7 comments
Checked for duplicates
Yes - I've already checked
Alternatives considered
Yes - and alternatives don't suffice
Related problems
No response
Describe the feature request
We want the ability to "black out" arbitrary list of sensing datetime ranges on a per frame basis. We envision a ADT-maintained json file that specifies a list of date ranges on per frame basis. We could generalize this file a bit and add possibility of adding other per-frame processing directives. Absence of "blackout_dates" indicate that processing should proceed normally as we have data available.
It would something like the following:
{
"831": {}
"832": {"blackout_dates": [ {"start": ""2024-12-30T23:05:24", "end": ""2025-03-15T23:05:24"}. ...]},
...
"46543": {"blackout_dates": [ {"start": ""2024-11-15T23:05:24", "end": ""2025-04-30T23:05:24"}. ...]}
}
This blackout file is determined by settings.yaml
field value like the following:
DISP_S1_BLACKOUT_DATES_S3PATH: "s3://opera-ancillaries/disp_frames/disp_s1_blackout_dates/sample_disp_s1_blackout.json"
this looks good to me, just to clarifications:
- would you want this to be the same file as the "consistent frame database" JSON? or separate?
- would the datetimes have to exactly line up with an acquisition sensing time? or just be used as a range?
- While it's not a strong preference, I'd like it to be a separate file. That way we don't have to swap out "consistent frame database" file every time we want to change the black out dates.
- Any arbitrary date range would work. And we can have multiple date ranges per frame as well. When we determine that at least one burst falls within a blackout date range, that entire frame-sensing_datetime will be blacked out / ignored from processing
This is how I have currently coded the blackout dates file
{
"831": {"blackout_dates": [ {"start": "2017-01-24T23:00:00", "end": "2017-08-24T23:00:00"}]},
"832": {"blackout_dates": [ {"start": "2017-01-24T23:00:00", "end": "2017-08-24T23:00:00"}, {"start": "2022-01-24T23:00:00", "end": "2022-08-24T23:00:00"}]},
"11115": {"blackout_dates": [ {"start": "2019-11-29T14:06:51", "end": "2020-02-27T14:07:30"}, {"start": "2020-11-29T14:07:39", "end": "2021-03-05T14:07:36"}]},
"46543": {}
}
We could make it slightly less verbose by:
{
"blackout_dates: {
"831": [ {"start": "2017-01-24T23:00:00", "end": "2017-08-24T23:00:00"}],
"832": [ {"start": "2017-01-24T23:00:00", "end": "2017-08-24T23:00:00"}, {"start": "2022-01-24T23:00:00", "end": "2022-08-24T23:00:00"}],
"11115": [ {"start": "2019-11-29T14:06:51", "end": "2020-02-27T14:07:30"}, {"start": "2020-11-29T14:07:39", "end": "2021-03-05T14:07:36"}],
"46543": []
}
Even less:
{
"blackout_dates: {
"831": [ [ "2017-01-24T23:00:00", "2017-08-24T23:00:00"]],
"832": [ ["2017-01-24T23:00:00", "2017-08-24T23:00:00"], ["2022-01-24T23:00:00", "2022-08-24T23:00:00"]],
"11115": [ ["2019-11-29T14:06:51", "2020-02-27T14:07:30"], ["2020-11-29T14:07:39", "2021-03-05T14:07:36"]],
"46543": []
}
@scottstanie @sjlewis-jpl @LucaCinquini Do you guys have a preference? In the end this file is generated and consumed by software but there might be some debugging implications.
The log output from CSLC query job when it skips a granules because it falls within a blackout date currently look like this. @sjlewis-jpl @LucaCinquini open to feedback here:
[2024-10-01 01:15:54,619: INFO/query_cmr] Skipping granule OPERA_L2_CSLC-S1_T042-088921-IW3_20200221T140735Z_20240430T094255Z_S1B_VV_v1.1 because frame_id=11115 falls on a blackout date blackout_start='2019-11-29T14:06:51Z' blackout_end='2020-02-27T14:07:30Z'
[2024-10-01 01:15:54,619: INFO/query_cmr] Skipping granule OPERA_L2_CSLC-S1_T042-088921-IW2_20200221T140734Z_20240430T094255Z_S1B_VV_v1.1 because frame_id=11115 falls on a blackout date blackout_start='2019-11-29T14:06:51Z' blackout_end='2020-02-27T14:07:30Z'
[2024-10-01 01:15:54,619: INFO/query_cmr] Skipping granule OPERA_L2_CSLC-S1_T042-088921-IW1_20200221T140733Z_20240430T094318Z_S1B_VV_v1.1 because frame_id=11115 falls on a blackout date blackout_start='2019-11-29T14:06:51Z' blackout_end='2020-02-27T14:07:30Z'
It's really up to you and Scott to decide - whatever works best for you guys.
I don't think I have a preference, though if this is really the only thing in the file, and it has something like "disp-s1-blackout-dates" in the filename, the least redundant would probably be
{
"831": [ [ "2017-01-24T23:00:00", "2017-08-24T23:00:00"]],
"832": ...
Ok this is what I propose then. Only blackout_dates
field would be consumed by PCM. Everything else is for human context.
{
"creation_date": "2024-09-30",
"comments": "This file is used to black out winters in Alaska and northern North Dakota, as discussed in meeting on xxxx-xx-xx",
"blackout_dates: {
"831": [ [ "2017-01-24T23:00:00", "2017-08-24T23:00:00"]],
"832": [ ["2017-01-24T23:00:00", "2017-08-24T23:00:00"], ["2022-01-24T23:00:00", "2022-08-24T23:00:00"]],
"11115": [ ["2019-11-29T14:06:51", "2020-02-27T14:07:30"], ["2020-11-29T14:07:39", "2021-03-05T14:07:36"]],
"46543": []
}