improve and optimize checkpointing/state system

Question

improve and optimize checkpointing/state system

bch0w opened this issue 2 years ago · 5 comments

The checkpointing system is pretty rudimentary at the moment, it simply checks if a function in workflow.task_list has been run (by name) and skips the function if it has. This is not optimal as it means a lot of repeated simulations will take place. E.g., if 60 forward simulations are launched and all but one finishes, restarting the workflow will cause all 60 jobs to be re-run.

Additionally, there is no clear way to interact with the state file. At the moment it involves directly editing the text file, but that is a bit clunky and prone to error. It would be great to have a command line tool to check, edit and clear a statefile. The state file should likely also be a hidden file to prevent casual users from accidentally deleting or editing it manually.

Tasks to be completed are then:

Improve task-level checkpointing (this can be done by checking solver logs, or generated waveforms in the 'traces' dir.)
Create a seisflows state command line option to manipulate the state file
Generate the entire state file at the setup stage, as opposed to building it throughout the first iteration
Make the state file a hidden file

Answer 1 · 2023-06-14T07:27:29.000Z

Hi @bch0w, I implemented a task-level checkpoint system. I have been testing it during the past months and it seems to work pretty well. I will submit a pull request this week!

Answer 2 · 2023-06-14T07:27:30.000Z

Hi @bch0w, I implemented a task-level checkpoint system. I have been testing it during the past months and it seems to work pretty well. I will submit a pull request this week!

Answer 3 · 2023-08-28T22:43:50.000Z

Hi @evcano, just wondering if you have any updates on your task-level checkpoint system? No pressure, just something I have been thinking about recently and I would be excited to see your work on it!

Answer 4 · 2023-09-04T13:32:56.000Z

Hi @bch0w,

I am cleaning up the changes I made. I will submit the pull request tomorrow!