[New Feature]: DSWx-S1 PGE Job Triggering Validator

Checked for duplicates

Yes - I've already checked

Alternatives considered

Yes - and alternatives don't suffice

Describe the feature request

A script that validates DSWx-S1 triggering logic using a combination of NASA Earthdata CMR and the ADT provided MGRS Tile Database (which maps RTC bursts to tiles that need to be processed using triggering logic).

Current Triggering Logic for DSWx-S1 PGE:

“X%” and “N hours” are PCM parameters that can be set.

PCM queries for new RTC-S1 granules
PCM checks for DSWx-S1 Tile Sets that are not yet processed
- Any unprocessed Tile Sets with 100% of input RTC-S1 available are immediately submitted for processing
- If an unprocessed Tile Set has >=X%, but <100%, PCM will start a timer lasting N hours.
  - If no new inputs are available after N hours, the Tile Set is processed with the data available
  - If new inputs become available in <N hours (but still <100% coverage), the timer resets.
- If an unprocessed Tile Set has less than X% of inputs, no action is taken
If new RTC-S1 products arrive for Tile Sets already processed (presumably they were processed with <100% of inputs), then PCM will _______.

Expected Validator Logic for DSWx-S1 PGE:

The intention of the validation tool is to return which tile sets should have been processed for a given set of RTC bursts and associated triggering logic / parameters. The validation tool is highly dependent on when the tool is run, as the list of available RTC bursts could change.

Input:
- T time-range (start / end)
- X% of RTC completion threshold for a given MGRS tile set
- N hours for timer wait duration
Steps:
1. Query CMR for available RTC bursts matching T, denoted as RTC Bursts
2. Take list of RTC Bursts and query MGRS Tile Database
  1. Query for Tile Set IDs that have 100% coverage for provided RTC Bursts
  2. Query for Tile Set IDs that have >= X% coverage for provided RTC Bursts
Output:
- List of Tile Set IDs that should have been processed
- If the creation timestamps for RTC bursts have variance > N hours, return a warning that timer triggering logic should have been activated, and to ensure that the right number of tile set IDs processed matches the validator

I would like the script to be runnable/usable "offline", i.e. without CMR in the loop. Having the ability to query CMR to generate the list of RTC products is a good feature, and something we definitely want. I think just keeping Steps i and ii as separate functions, and use the output from Step i as part of the input to Step ii, and allowing Step ii to be invoked with some list (or text file, or whatever) would do the trick. That would make this much more unit-testable as well.

Hi Rishi, very impressive, I tried it and it worked for me. A couple of comments on the installation:

o You also need to install python_cmr:
pip install python_cmr

o On my mac (Apple M2 pro chip) and Python 3.9.6 I could not install sqllite3 as from instructions, I had to build the wheel and install it in the virtual environment following the instructions here:
https://til.simonwillison.net/sqlite/build-specific-sqlite-pysqlite-macos

Hi Rishi, very impressive, I tried it and it worked for me. A couple of comments on the installation:

o You also need to install python_cmr: pip install python_cmr

o On my mac (Apple M2 pro chip) and Python 3.9.6 I could not install sqllite3 as from instructions, I had to build the wheel and install it in the virtual environment following the instructions here: https://til.simonwillison.net/sqlite/build-specific-sqlite-pysqlite-macos

@LucaCinquini thank you for trying this out and your feedback! I've addressed your comments in the latest commit.

Hi Rishi - the functionality is all there. I have some suggestions for refactors, and a few new (but not critical) features, below.

Make this importable & usable as a module
- ~~script portion inside an if __name__ == '__main__' block~~
- ~~change dashes to underscores in the filename~~
- logic in the script portion limited to: parse inputs, call the function.
Make some unit tests, at least for the core logic (maybe we don't want our unit tests hitting the CMR servers?).
Input:
- Passing in just a list of Burst IDs (as an alternative to the list of RTC granule IDs) would be useful.
Output:
- Print a count of MGRS Tile IDs covered (useful when the list is long)
- In each row of the table output, add additional columns with the number of input bursts seen, and the total number of input bursts expected ('Matching Burst Count' and 'Total Burst Count')
Various other refactor suggestions - probably easier to handle through the PR interface. I can also make some commits showing these suggestions.

Edit: just made some commits covering a few of the suggestions up above.

Summary of conversation with @chrisjrd regarding DSWx-S1 validation using validator script and existing code:

General strategy for matching RTC batches from CMR to rows in the MGRS Tile DB is consistent with the coding strategy
Validator script is using CMR Python API versus code using custom REST invocations to CMR
mgrs_tile_id would uniquely identify DSWx-S1 jobs within a 12 day orbit cycle, but you need to add the acquisition orbit cycle number in order to uniquely identify mgrs tile ids / dswx-s1 jobs within the year time frame. Example: MS_166_30$5

Next steps:

@riverma to add handling the 12-day orbit cycle issue, by adding acquisition cycle ID.
- Value of 0 means Jan 1st - Jan 12th, value of 1 means Jan 12th - Jan 24th
- See code to generate this number here.
Be ready to perform a test using the below metadata mappings in OPERA SDS

Metadata matching in OPERA SDS Elasticsearch:

DSWx-S1 Product Index (GRQ): rtc_catalog-YYYY-MM
- Metadata: need to confirm with @collinss-jpl
DSWx-S1 Job Index (Mozart): job_statuses-YYYY-MM

Metadata:

opera-sds-pcm/data_subscriber/rtc/rtc_job_submitter.py

Lines 38 to 54 in 5a35be7

    
           "metadata": { 
        
               "batch_id": batch_id, 
        
               "product_paths": {"L2_RTC_S1": s3paths}, 
        
               "mgrs_set_id": mgrs_set_id, 
        
               "FileName": batch_id, 
        
               "id": batch_id, 
        
               "bounding_box": bounding_box, 
        
               "Files": [ 
        
                   { 
        
                       "FileName": PurePath(s3path).name, 
        
                       "FileSize": 1, 
        
                       "FileLocation": os.path.dirname(s3path), 
        
                       "id": PurePath(s3path).name, 
        
                       "product_paths": "$.product_paths" 
        
                   } 
        
                   for s3path in s3paths 
        
               ]

Another feature that would be useful is the ability to output the MGRS Tile IDs that were produced, not just the Tile Set IDs. Maybe as an optional output? Or as another column in the output table? That information would make it easier to compare the script's output against the results of testing (or operations), where we only see the Tile IDs in the output granule name.

Test Cases

100% coverage case

Command:
python dswx-s1-validator.py --start "2023-12-05T00:00:00Z" --end "2023-12-05T00:59:59Z" --db MGRS_tile_collection_v0.2.sqlite --threshold 100

Expected Tile Sets:

MGRS Set ID      Coverage Percentage
MS_165_17                        100
MS_165_18                        100
MS_165_19                        100
MS_165_20                        100
MS_165_21                        100
MS_165_22                        100
MS_165_23                        100
MS_165_24                        100
MS_165_25                        100
MS_165_26                        100
MS_165_27                        100
MS_165_73                        100
MS_165_74                        100
MS_165_75                        100
MS_165_76                        100
MS_165_77                        100
MS_165_78                        100
MS_165_79                        100
MS_165_80                        100
MS_165_81                        100
MS_165_82                        100
MS_165_83                        100
MS_165_84                        100
MS_165_85                        100
MS_165_86                        100
MS_165_87                        100
MS_165_88                        100
MS_165_89                        100

50% coverage

Command:
python dswx-s1-validator.py --start "2023-12-05T00:00:00Z" --end "2023-12-05T00:59:59Z" --db MGRS_tile_collection_v0.2.sqlite --threshold 50

Expected Tile Sets:

MGRS Set ID      Coverage Percentage
MS_165_12                      85
MS_165_17                     100
MS_165_18                     100
MS_165_19                     100
MS_165_20                     100
MS_165_21                     100
MS_165_22                     100
MS_165_23                     100
MS_165_24                     100
MS_165_25                     100
MS_165_26                     100
MS_165_27                     100
MS_165_72                      60
MS_165_73                     100
MS_165_74                     100
MS_165_75                     100
MS_165_76                     100
MS_165_77                     100
MS_165_78                     100
MS_165_79                     100
MS_165_80                     100
MS_165_81                     100
MS_165_82                     100
MS_165_83                     100
MS_165_84                     100
MS_165_85                     100
MS_165_86                     100
MS_165_87                     100
MS_165_88                     100
MS_165_89                     100
MS_165_90                      60.98

Midnight Intersecting Time Range w/ 50% Coverage

Command:
python dswx-s1-validator.py --start "2023-12-04T23:30:00Z" --end "2023-12-05T00:30:00Z" --db MGRS_tile_collection_v0.2.sqlite --threshold 50

Expected Tile Sets:

MGRS Set ID      Coverage Percentage
MS_164_153                     72.09
MS_164_154                    100
MS_164_155                    100
MS_164_156                    100
MS_164_157                    100
MS_164_158                    100
MS_164_159                    100
MS_164_160                     54.76
MS_165_6                      100
MS_165_7                      100
MS_165_8                      100
MS_165_9                      100
MS_165_10                     100
MS_165_11                     100
MS_165_12                     100
MS_165_17                     100
MS_165_18                     100
MS_165_19                     100
MS_165_20                     100
MS_165_21                     100
MS_165_22                     100
MS_165_23                     100
MS_165_24                     100
MS_165_25                     100
MS_165_26                     100
MS_165_27                     100

Single (1) Burst Coverage Requirement

Command:
python dswx_s1_validator.py --start "2023-12-11T14:05:00Z" --end "2023-12-11T14:40:00Z" --db MGRS_t ile_collection_v0.2.sqlite --threshold 1

Expected Tile Sets:

MGRS Set IDs covered: 5
MGRS Set ID      Coverage Percentage
MS_86_23                       52.5
MS_86_24                      100
MS_86_25                      100
MS_86_26                       78.05
MS_86_27                        2.38

--

I believe the --threshold parameter just filters the output by the Coverage Percentage, yes? I think it would be more useful to have the tests use --threshold 1 to show ALL of the TileSets touched by the time range.

Well, we should have one test with a higher threshold, just to prove that the filtering works correctly.

I believe the --threshold parameter just filters the output by the Coverage Percentage, yes?

@sjlewis-jpl - you're correct, the --threshold parameter is really just for display purposes. A Pandas DataFrame is filtered before showing the results based on the threshold value. See snippet here.

I think it would be more useful to have the tests use --threshold 1 to show ALL of the TileSets touched by the time range.

Hmm. Certainly that's a good test case to add (will do). However, we will want to also test the functionality of the threshold parameter having "in-between" values between 1 and 100. I just added a test case about for the threshold value of 1.

Edge Cases

Thanks to @jungkyoJung for these!

Only one burst are available

Command:
python dswx_s1_validator.py --start "2023-11-01T01:28:15Z" --end "2023-11-01T01:34:15Z" --db MGRS_tile_collection_v0.2.sqlite --threshold 1

Expected Tile Sets:

MGRS Set IDs covered: 8
MGRS Set ID      Coverage Percentage  Matching Bursts      Matching Burst Count
MS_20_29                         2.5  t020_041121_iw1                         1
...

Four polarizations are available: VV/VH/HH/HV

Command:
python dswx_s1_validator.py --start "2023-11-01T22:49:51Z" --end "2023-11-01T22:55:51Z" --db MGRS_tile_collection_v0.2.sqlite --threshold 1

Expected Tile Sets:

MGRS Set IDs covered: 11
MGRS Set ID      Coverage Percentage  Matching Bursts                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    Matching Burst Count
MS_33_26                       80.49  t033_068997_iw3, t033_068998_iw2, t033_068998_iw3, t033_068999_iw1, t033_068999_iw2, t033_068999_iw3, t033_069000_iw1, t033_069000_iw2, t033_069000_iw3, t033_069001_iw1, t033_069001_iw2, t033_069001_iw3, t033_069002_iw1, t033_069002_iw2, t033_069002_iw3, t033_069003_iw1, t033_069003_iw2, t033_069003_iw3, t033_069004_iw1, t033_069004_iw2, t033_069004_iw3, t033_069005_iw1, t033_069005_iw2, t033_069005_iw3, t033_069006_iw1, t033_069006_iw2, t033_069006_iw3, t033_069007_iw1, t033_069007_iw2, t033_069007_iw3, t033_069011_iw1, t033_069011_iw2, t033_069012_iw1                      33
...

HH/ HV polarizations are available

Command:
python dswx_s1_validator.py --start "2023-11-08T22:41:33Z" --end "2023-11-08T22:47:33Z" --db MGRS_tile_collection_v0.2.sqlite --threshold 1

Expected Tile Sets:

MGRS Set IDs covered: 6
MGRS Set ID      Coverage Percentage  Matching Bursts                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Matching Burst Count
MS_135_25                       77.5  t135_288075_iw3, t135_288076_iw2, t135_288076_iw3, t135_288077_iw1, t135_288077_iw2, t135_288077_iw3, t135_288078_iw1, t135_288078_iw2, t135_288078_iw3, t135_288082_iw1, t135_288082_iw2, t135_288082_iw3, t135_288083_iw1, t135_288083_iw2, t135_288083_iw3, t135_288084_iw1, t135_288084_iw2, t135_288084_iw3, t135_288085_iw1, t135_288085_iw2, t135_288085_iw3, t135_288086_iw1, t135_288086_iw2, t135_288086_iw3, t135_288087_iw1, t135_288087_iw2, t135_288087_iw3, t135_288088_iw1, t135_288088_iw2, t135_288089_iw1, t135_288090_iw1                      31
...

Small portion of land compared to water region

Command:
python dswx_s1_validator.py --start "2023-11-11T22:57:27Z" --end "2023-11-11T23:03:27Z" --db MGRS_tile_collection_v0.2.sqlite --threshold 1

Expected Tile Sets:

MGRS Set IDs covered: 9
MGRS Set ID      Coverage Percentage  Matching Bursts                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Matching Burst Count
MS_4_8                           100  t004_006512_iw3, t004_006513_iw2, t004_006513_iw3, t004_006514_iw1, t004_006514_iw2, t004_006514_iw3, t004_006515_iw1, t004_006515_iw2, t004_006515_iw3, t004_006516_iw1, t004_006516_iw2, t004_006516_iw3, t004_006517_iw1, t004_006517_iw2, t004_006517_iw3, t004_006518_iw1, t004_006518_iw2, t004_006518_iw3, t004_006519_iw1, t004_006519_iw2, t004_006519_iw3, t004_006520_iw1, t004_006520_iw2, t004_006520_iw3, t004_006521_iw1, t004_006521_iw2, t004_006521_iw3, t004_006522_iw1, t004_006522_iw2, t004_006522_iw3, t004_006523_iw1, t004_006523_iw2, t004_006523_iw3, t004_006524_iw1, t004_006524_iw2, t004_006524_iw3, t004_006525_iw1, t004_006525_iw2, t004_006526_iw1                      39
...

Only water covered.

Command:
python dswx_s1_validator.py --start "2023-11-11T23:00:48Z" --end "2023-11-11T23:06:48Z" --db MGRS_tile_collection_v0.2.sqlite --threshold 1

Expected Tile Sets:

MGRS Set IDs covered: 10
MGRS Set ID      Coverage Percentage  Matching Bursts                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Matching Burst Count
MS_4_15                          100  t004_006588_iw3, t004_006589_iw2, t004_006589_iw3, t004_006590_iw1, t004_006590_iw2, t004_006590_iw3, t004_006591_iw1, t004_006591_iw2, t004_006591_iw3, t004_006592_iw1, t004_006592_iw2, t004_006592_iw3, t004_006593_iw1, t004_006593_iw2, t004_006593_iw3, t004_006594_iw1, t004_006594_iw2, t004_006594_iw3, t004_006595_iw1, t004_006595_iw2, t004_006595_iw3, t004_006596_iw1, t004_006596_iw2, t004_006596_iw3, t004_006597_iw1, t004_006597_iw2, t004_006597_iw3, t004_006598_iw1, t004_006598_iw2, t004_006598_iw3, t004_006599_iw1, t004_006599_iw2, t004_006599_iw3, t004_006600_iw1, t004_006600_iw2, t004_006600_iw3, t004_006601_iw1, t004_006601_iw2, t004_006602_iw1                      39
...

only water covered. Ancillary data may have only invalid values

Command:
python dswx_s1_validator.py --start "2023-11-01T22:43:18Z" --end "2023-11-01T22:49:18Z" --db MGRS_tile_collection_v0.2.sqlite --threshold 1

Expected Tile Sets:

MGRS Set IDs covered: 6
MGRS Set ID      Coverage Percentage  Matching Bursts                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           Matching Burst Count
MS_33_13                         100  t033_068856_iw3, t033_068857_iw2, t033_068857_iw3, t033_068858_iw1, t033_068858_iw2, t033_068858_iw3, t033_068859_iw1, t033_068859_iw2, t033_068859_iw3, t033_068860_iw1, t033_068860_iw2, t033_068860_iw3, t033_068861_iw1, t033_068861_iw2, t033_068861_iw3, t033_068862_iw1, t033_068862_iw2, t033_068862_iw3, t033_068863_iw1, t033_068863_iw2, t033_068863_iw3, t033_068864_iw1, t033_068864_iw2, t033_068864_iw3, t033_068865_iw1, t033_068865_iw2, t033_068865_iw3, t033_068866_iw1, t033_068866_iw2, t033_068866_iw3, t033_068867_iw1, t033_068867_iw2, t033_068867_iw3, t033_068868_iw1, t033_068868_iw2, t033_068868_iw3, t033_068869_iw1, t033_068869_iw2, t033_068870_iw1, t033_068871_iw1                      40
...

56 bursts are required

Command:
python dswx_s1_validator.py --start "2023-10-23T18:27:51Z" --end "2023-10-23T18:33:51Z" --db MGRS_tile_collection_v0.2.sqlite --threshold 1

Expected Tile Sets:

MGRS Set IDs covered: 8
MGRS Set ID      Coverage Percentage  Matching Bursts                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Matching Burst Count
MS_74_46                         100  t074_157281_iw3, t074_157282_iw3, t074_157283_iw2, t074_157283_iw3, t074_157284_iw2, t074_157284_iw3, t074_157285_iw2, t074_157285_iw3, t074_157286_iw1, t074_157286_iw2, t074_157286_iw3, t074_157287_iw1, t074_157287_iw2, t074_157287_iw3, t074_157288_iw1, t074_157288_iw2, t074_157288_iw3, t074_157289_iw1, t074_157289_iw2, t074_157289_iw3, t074_157290_iw1, t074_157290_iw2, t074_157290_iw3, t074_157291_iw1, t074_157291_iw2, t074_157291_iw3, t074_157292_iw1, t074_157292_iw2, t074_157292_iw3, t074_157293_iw1, t074_157293_iw2, t074_157293_iw3, t074_157294_iw1, t074_157294_iw2, t074_157294_iw3, t074_157295_iw1, t074_157295_iw2, t074_157295_iw3, t074_157296_iw1, t074_157296_iw2, t074_157296_iw3, t074_157297_iw1, t074_157297_iw2, t074_157298_iw1, t074_157298_iw2, t074_157299_iw1, t074_157299_iw2, t074_157300_iw1, t074_157301_iw1, t074_157302_iw1, t074_157303_iw1                      51

Anti-meridian

Command:
python dswx_s1_validator.py --start "2023-11-11T18:29:17Z" --end "2023-11-11T18:35:17Z" --db MGRS_tile_collection_v0.2.sqlite --threshold 1

Expected Tile Sets:

MGRS Set IDs covered: 1
MGRS Set ID    Coverage Percentage    Matching Bursts    Matching Burst Count

Memory issue and possible missing ancillary data when 100 % coverage required but only 61 available

Command:
python dswx_s1_validator.py --start "2023-11-01T11:32:48Z" --end "2023-11-01T11:38:48Z" --db MGRS_tile_collection_v0.2.sqlite --threshold 1

Expected Tile Sets:

MGRS Set IDs covered: 7
MGRS Set ID      Coverage Percentage  Matching Bursts                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        Matching Burst Count
MS_26_48                       60.66  t026_054262_iw1, t026_054262_iw2, t026_054262_iw3, t026_054263_iw1, t026_054263_iw2, t026_054263_iw3, t026_054264_iw1, t026_054264_iw2, t026_054264_iw3, t026_054265_iw1, t026_054265_iw2, t026_054265_iw3, t026_054266_iw1, t026_054266_iw2, t026_054266_iw3, t026_054267_iw1, t026_054267_iw2, t026_054267_iw3, t026_054268_iw1, t026_054268_iw2, t026_054268_iw3, t026_054269_iw1, t026_054269_iw2, t026_054269_iw3, t026_054270_iw1, t026_054270_iw2, t026_054270_iw3, t026_054271_iw1, t026_054271_iw2, t026_054271_iw3, t026_054272_iw2, t026_054272_iw3, t026_054273_iw2, t026_054273_iw3, t026_054274_iw2, t026_054274_iw3, t026_054275_iw3                      37

In order to test the following requirement:
"If an unprocessed Tile Set has >=X%, but <100%, PCM will start a timer lasting N hours. If no new inputs are available after N hours, the Tile Set is processed with the data available"

Multiple series of hourly queries/triggering need to be run contiguously. e.g.:

Run from 00:00:00 to 01:00:00. Some will meet 100% and trigger; some will stay round
Run from 01:00:00 to 02:00:00. Some of the left over from the last hour may add up with this new set of data, meet the % threshold, and then trigger. (If the grace period is 60 mins that is)

That's the only way to have any unprocessed Tile Set to be processed. I think this test case is missing here.

@philipjyoon - thanks for your insight.

If enough time has passed, we would expect the results of the validator script at >= X% threshold coverage to match PCM product generation. It's trickier close to FWD processing, but running the simple coverage check after a few days should suffice for your use case. Let me know if you think otherwise.

@riverma The following two use-cases can (and will over time) yield different trigger logic output:

Query for one hour every time iteratively over 5 hrs and then process triggering rule at every hour using the prior state of the system, taking into account the time at which each granule entered the system (when they were queried).
Query once to cover all 5 hrs and process triggering rules over the entire set at once without factoring in the time at which the granules entered the system.

They would produce different results because the time at which the granules entered the system is one of the factors in the triggering rule. Slide 5 here illustrate this: https://docs.google.com/presentation/d/1maA2nqimOXFzMfGfIF0JcTd2XvB0fh3gjtwuya93CnI/edit?pli=1#slide=id.g28b86c493b7_0_0

It seems the test cases listed here and the way the validator works currently are performing #2 only. I think it's crucial to test for the use-case in that design diagram, sooner the better and preferably during development as opposed to I&T.

Hi @philipjyoon - thanks for your comments. I'm still not getting the difference though, from the perspective of the validator script. Let's try and work through the below visual example to see if and how we differ in interpretation.

Assume:

60% burst coverage is needed for any given MGRS Tile Set to be kicked off for DSWx-S1 processing
FWD processing rules indicate triggering rules are evaluated every hour, and reprocessing occurs if more bursts are available past the 60% mark but still less than 100% up to a maximum amount of wait time (say 10 hours)

Summary of the below processing scenario:

DSWx-S1 job is triggered 3 times for MGRS Tile Set ID MS_166_30
100% of the bursts for MS_166_30 are expected to be covered within the final DSWx-S1 product (i.e. v3 at time window 10)

My understanding / conclusion:

The validator script doesn't and shouldn't care how many times processing kicked off (that's an internal PCM detail)
The validator script should care about the existence and composition of DSWx-S1 products by verifying:
- MS_166_30 is processed as part of a DSWx-S1 job
- MS_166_30 DSWx-S1 product contains not just 60% coverage as spelled out in the threshold minimum, but 100% as listed by the available RTC bursts and no less.
Your two scenarios described (iterative vs all-at-once) lead to the same effective outcome when evaluated at time step 10 in the below example
The examples I've provided earlier in this ticket don't spell out the specific burst IDs that should have been evaluated. Doing so will ensure we verify the "composition" of the DSWx-S1 products not just the existence - i.e. we'll have covered the (re-)triggering scenarios you described.

Time Window 1

MS_166_30 (20% coverage)	...
t020_041121_iw1 ✅	...
t020_041122_iw2 ✅	...
t020_041123_iw3 ❌	...
t020_041124_iw4 ❌	...
t020_041125_iw5 ❌	...
t020_041126_iw6 ❌	...
t020_041127_iw7 ❌	...
t020_041128_iw8 ❌	...
t020_041129_iw9 ❌	...
t020_041130_iw10 ❌	...

Notes:

No processing triggered

Time Window 4

MS_166_30 (40% coverage)	...
t020_041121_iw1 ✅	...
t020_041122_iw2 ✅	...
t020_041123_iw3 ✅	...
t020_041124_iw4 ✅	...
t020_041125_iw5 ❌	...
t020_041126_iw6 ❌	...
t020_041127_iw7 ❌	...
t020_041128_iw8 ❌	...
t020_041129_iw9 ❌	...
t020_041130_iw10 ❌	...

Notes:

No processing triggered

Time Window 5

MS_166_30 (60% coverage)	...
t020_041121_iw1 ✅	...
t020_041122_iw2 ✅	...
t020_041123_iw3 ✅	...
t020_041124_iw4 ✅	...
t020_041125_iw5 ✅	...
t020_041126_iw6 ✅	...
t020_041127_iw7 ❌	...
t020_041128_iw8 ❌	...
t020_041129_iw9 ❌	...
t020_041130_iw10 ❌	...

Notes:

60% threshold for coverage reached
No processing triggered because we always wait two more time window intervals just in case more data is coming

Time Window 7

MS_166_30 (60% coverage)	...
t020_041121_iw1 ✅	...
t020_041122_iw2 ✅	...
t020_041123_iw3 ✅	...
t020_041124_iw4 ✅	...
t020_041125_iw5 ✅	...
t020_041126_iw6 ✅	...
t020_041127_iw7 ❌	...
t020_041128_iw8 ❌	...
t020_041129_iw9 ❌	...
t020_041130_iw10 ❌	...

Notes:

No new bursts available
DSWx-S1 PGE triggered

Time Window 8

MS_166_30 (60% coverage)	...
t020_041121_iw1 ✅	...
t020_041122_iw2 ✅	...
t020_041123_iw3 ✅	...
t020_041124_iw4 ✅	...
t020_041125_iw5 ✅	...
t020_041126_iw6 ✅	...
t020_041127_iw7 ❌	...
t020_041128_iw8 ❌	...
t020_041129_iw9 ❌	...
t020_041130_iw10 ❌	...

Notes:

No new bursts available
No processing triggered

Time Window 9

MS_166_30 (80% coverage)	...
t020_041121_iw1 ✅	...
t020_041122_iw2 ✅	...
t020_041123_iw3 ✅	...
t020_041124_iw4 ✅	...
t020_041125_iw5 ✅	...
t020_041126_iw6 ✅	...
t020_041127_iw7 ✅	...
t020_041128_iw8 ✅	...
t020_041129_iw9 ❌	...
t020_041130_iw10 ❌	...

Notes:

2 new bursts available
DSWx-S1 PGE (re-)triggered

Time Window 10

MS_166_30 (100% coverage)	...
t020_041121_iw1 ✅	...
t020_041122_iw2 ✅	...
t020_041123_iw3 ✅	...
t020_041124_iw4 ✅	...
t020_041125_iw5 ✅	...
t020_041126_iw6 ✅	...
t020_041127_iw7 ✅	...
t020_041128_iw8 ✅	...
t020_041129_iw9 ✅	...
t020_041130_iw10 ✅	...

Notes:

2 new bursts available
DSWx-S1 PGE (re-)triggered

Update: @philipjyoon and I discussed the above and came to the following conclusion: if the validator script is run after enough time post FWD processing, it will sufficiently cover the testing scenarios @philipjyoon brought up.

	"metadata": {
	"batch_id": batch_id,
	"product_paths": {"L2_RTC_S1": s3paths},
	"mgrs_set_id": mgrs_set_id,
	"FileName": batch_id,
	"id": batch_id,
	"bounding_box": bounding_box,
	"Files": [
	{
	"FileName": PurePath(s3path).name,
	"FileSize": 1,
	"FileLocation": os.path.dirname(s3path),
	"id": PurePath(s3path).name,
	"product_paths": "$.product_paths"
	}
	for s3path in s3paths
	]

Checked for duplicates

Alternatives considered

Related problems

Describe the feature request

Current Triggering Logic for DSWx-S1 PGE:

Expected Validator Logic for DSWx-S1 PGE:

Summary of conversation with @chrisjrd regarding DSWx-S1 validation using validator script and existing code:

Next steps:

Metadata matching in OPERA SDS Elasticsearch:

Test Cases

100% coverage case

50% coverage

Midnight Intersecting Time Range w/ 50% Coverage

Single (1) Burst Coverage Requirement

Edge Cases

Only one burst are available

Four polarizations are available: VV/VH/HH/HV

HH/ HV polarizations are available

Small portion of land compared to water region

Only water covered.

only water covered. Ancillary data may have only invalid values

56 bursts are required

Anti-meridian

Memory issue and possible missing ancillary data when 100 % coverage required but only 61 available

Time Window 1

Time Window 4

Time Window 5

Time Window 7

Time Window 8

Time Window 9

Time Window 10