[New Feature]: DISP-S1 Historical Processing operator-friendly features
Opened this issue · 6 comments
Checked for duplicates
Yes - I've already checked
Alternatives considered
Yes - and alternatives don't suffice
Related problems
No response
Describe the feature request
DISP-S1 historical processing is orders of magnitude more complex than the previous CSLC historical processing. To process the entire series is like running 1400 independent historical processing series. Currently we have internal state of each frame as number of sensing datetimes processed but that's hard to understand and also does not show the overall progress of the entire historical processing batch_proc.
Here are the recommended features:
- Store a single percentage number as the overall progress of that historical processing batch_proc. This is basically the ratio between total number of possible historical processing SCIFLO jobs and how many have succeeded so far.
- We may also want to track percentage progress on individual frame level. We would create another dictionary for this.
- We may also want to track the last date processed on a per frame basis. We currently track just the number of sensing datetimes track because this is what historical processing app needs internally. But that information is hard to make sense out of to the user.
For UI purposes, percentages are useful. For internal logic (and maybe you're already there), I think tracking the ratio of number processed
to total number
(as two integers) for each Frame would be more useful. That allows easy computation of a percentage, and makes it easy to add or average Frames together, with proper weighting.
In any case, making an easy way to track the progress per frame is a great idea.
@sjlewis-jpl This is what it currently looks like. Let me know what you think.
This is what you will see when you run python tools/pcm_batch.py view
frame_states
is not new; it's what the historical processing uses to track progress internally. It shows the number of sensing datetimes that's been submitted so far on per-frame basis.
frame_states {'8882': 105, '831': 105, '832': 105, '833': 105}
frame_completion_percentages ['8882: 48%', '831: 51%', '832: 47%', '833: 47%']
last_processed_datetimes {'8882': '2020-07-08T00:27:08', '831': '2020-09-27T23:06:17', '832': '2020-04-12T23:06:30', '833': '2020-04-12T23:06:52'}
progress_percentage 48%
To give some context, these were the historical processing batch_proc parameters:
data_start_date 2016-07-01T00:00:00
data_end_date 2024-07-01T00:00:00
k 15
frames [[831, 833], 8882]
And this is what the progress looks like for these particular frames and data date range at the end without having implemented the end-of-frame-series behavior that will process the k-remainders
frame_states {'8882': 210, '831': 195, '832': 210, '833': 210}
frame_completion_percentages ['8882: 95%', '831: 95%', '832: 95%', '833: 94%']
last_processed_datetimes {'8882': '2024-01-25T00:27:25', '831': '2024-02-03T23:06:30', '832': '2024-01-22T23:06:52', '833': '2024-01-10T23:07:15'}
progress_percentage 95%
Phil - the information looks pretty complete. Is there a line that gives the total number of frames to be submitted? Or does that just calculable with the frame_states
and frame_completion_percentages
lines?
It might be useful to add another word to the frame_states line, just to help make clear what's represented. Say, frame_states_submitted
, or something similar. Similarly, total_progress_percentage
, or total_completion_percentage
to make it consistent with the second line.
@sjlewis-jpl Showing total number of frames to be submitted is a great idea. I can change the field total_completion_percentage
to look like: 10 / 40 = 40%
The numerator is the total number of jobs submitted thus far and the denominator is the total number of jobs possible to submit. The percentage would be the same. We could make these three be three separate fields but that seems overly verbose.
I'd like to keep frames_states
field name the same. It's an internal field so a lot of code currently refers to it. If you feel strongly about changing it, I could change how it's displayed in the pcm_batch.py
output but that may lead to confusion later.
Your idea for displaying the fraction would be awesome, and maybe save needing another line for another field.
If it's hard to change the field names, then don't worry about it. Though changing it in the output report would also be fine. I was thinking of minimizing the documentation needed to teach someone else to read the reports.
How about I put the documentation right in the output at the bottom? Basically a legend for the table. I can take your and others' input on the verbiage.