nasa/opera-sds-pcm

[New Feature]: DISP-S1 Historical Processing operator-friendly features

Opened this issue · 6 comments

Checked for duplicates

Yes - I've already checked

Alternatives considered

Yes - and alternatives don't suffice

Related problems

No response

Describe the feature request

DISP-S1 historical processing is orders of magnitude more complex than the previous CSLC historical processing. To process the entire series is like running 1400 independent historical processing series. Currently we have internal state of each frame as number of sensing datetimes processed but that's hard to understand and also does not show the overall progress of the entire historical processing batch_proc.

Here are the recommended features:

  1. Store a single percentage number as the overall progress of that historical processing batch_proc. This is basically the ratio between total number of possible historical processing SCIFLO jobs and how many have succeeded so far.
  2. We may also want to track percentage progress on individual frame level. We would create another dictionary for this.
  3. We may also want to track the last date processed on a per frame basis. We currently track just the number of sensing datetimes track because this is what historical processing app needs internally. But that information is hard to make sense out of to the user.

For UI purposes, percentages are useful. For internal logic (and maybe you're already there), I think tracking the ratio of number processed to total number (as two integers) for each Frame would be more useful. That allows easy computation of a percentage, and makes it easy to add or average Frames together, with proper weighting.

In any case, making an easy way to track the progress per frame is a great idea.

@sjlewis-jpl This is what it currently looks like. Let me know what you think.

This is what you will see when you run python tools/pcm_batch.py view

frame_states is not new; it's what the historical processing uses to track progress internally. It shows the number of sensing datetimes that's been submitted so far on per-frame basis.

frame_states                    {'8882': 105, '831': 105, '832': 105, '833': 105}
frame_completion_percentages    ['8882: 48%', '831: 51%', '832: 47%', '833: 47%']
last_processed_datetimes        {'8882': '2020-07-08T00:27:08', '831': '2020-09-27T23:06:17', '832': '2020-04-12T23:06:30', '833': '2020-04-12T23:06:52'}
progress_percentage             48%

To give some context, these were the historical processing batch_proc parameters:

data_start_date                 2016-07-01T00:00:00
data_end_date                   2024-07-01T00:00:00
k                               15
frames                          [[831, 833], 8882]

And this is what the progress looks like for these particular frames and data date range at the end without having implemented the end-of-frame-series behavior that will process the k-remainders

frame_states                    {'8882': 210, '831': 195, '832': 210, '833': 210}
frame_completion_percentages    ['8882: 95%', '831: 95%', '832: 95%', '833: 94%']
last_processed_datetimes        {'8882': '2024-01-25T00:27:25', '831': '2024-02-03T23:06:30', '832': '2024-01-22T23:06:52', '833': '2024-01-10T23:07:15'}
progress_percentage             95%

Phil - the information looks pretty complete. Is there a line that gives the total number of frames to be submitted? Or does that just calculable with the frame_states and frame_completion_percentages lines?

It might be useful to add another word to the frame_states line, just to help make clear what's represented. Say, frame_states_submitted, or something similar. Similarly, total_progress_percentage, or total_completion_percentage to make it consistent with the second line.

@sjlewis-jpl Showing total number of frames to be submitted is a great idea. I can change the field total_completion_percentage to look like: 10 / 40 = 40% The numerator is the total number of jobs submitted thus far and the denominator is the total number of jobs possible to submit. The percentage would be the same. We could make these three be three separate fields but that seems overly verbose.

I'd like to keep frames_states field name the same. It's an internal field so a lot of code currently refers to it. If you feel strongly about changing it, I could change how it's displayed in the pcm_batch.py output but that may lead to confusion later.

Your idea for displaying the fraction would be awesome, and maybe save needing another line for another field.

If it's hard to change the field names, then don't worry about it. Though changing it in the output report would also be fine. I was thinking of minimizing the documentation needed to teach someone else to read the reports.

How about I put the documentation right in the output at the bottom? Basically a legend for the table. I can take your and others' input on the verbiage.