adjtomo/seisflows

New system sub-class that prioritizes long queue times and large jobs

bch0w opened this issue · 2 comments

bch0w commented

Following discussions with the Princeton group, it would be great to create a system class that prioritizes long queue times and large jobs over arrayed jobs. SeisFlows currently submits N array jobs (where N is the number of events used) on the system, which may take an appreciable amount of time as each job must be scheduled separately. If queue times are long on the system, wait times may be high.

One approach to fix this would be to submit one large job where each of the N tasks is doled out on the compute node itself (as opposed to distributing jobs as arrays from the master job). This could be contained within a separate 'qcluster' (q for queue) system module which has some internal logic to dole out these tasks after job submission, perhaps taking advantage of asyncio or a ThreadPoolExecutor from concurrent.futures.

bch0w commented

The NUMBER_OF_SIMULTANEOUS_RUNS parameter is available in all versions of SPECFEM and would be a useful target for this issue. It allows a User to submit one large job for N events, each event running on P processors. Rather than submitting N array jobs, each running on N cores, the User submits one job on NxP cores, and internally SPECFEM will distribute the job.

I need to test this capability and see what the finer details are, but I think SeisFlows can take advantage of this capability to submit large, long queue time, high core-number jobs.

bch0w commented

Notes on NUMBER_OF_SIMULTANEOUS_RUNS parameter (developing with Global code)

https://specfem3d-globe.readthedocs.io/en/latest/04_running_the_solver/#note-on-the-simultaneous-simulation-of-several-earthquakes

  • Each run???? directory that is not run0001 does not require a Par_file
  • Failed runs create text files with names like 'run0001_failed' or 'run_with_local_rank_00000000and_global_rank_00000000_failed'
  • The ROOTDIR LOCAL_PATH directories (OUTPUT_FILES and DATABASES_MPI) can be empty (when using broadcast_mesh_and_model parameter). Only run0001 requires actual mesh and model files
  • ROOTDIR/DATA only requires a Par_file but not CMTSOLUTION or STATIONS file

Outline on what will need to be changed:

  • Solver must initialize working directories to match required SPECFEM structure (cdf4841)
  • System needs to submit one large job rather than array job
  • Workflow needs to be adjusted to bundle jobs differently since things like preprocessing cannot be included in this large many-core job
  • Preprocessing, and bookkeeping needs to be re-structured as it currently is addressed by each solver array job. This will likely require some code restructuring.