radical-collaboration/hpc-workflows

Logging feature request

lsawade opened this issue · 1 comments

nnodes link

One of the great features of nnodes is a direct task logging ability while the jobs is running. We didn't know we needed this until we had it. Very simply put, it's a command line tool that let's you keep track of all jobs that are running, done, failed, or to be submitted. The example output for two moment tensor inversions is as follows:

- C090497A
  0) iteration
    0) mpi-create-dir-C090497A (04:50)
    1) forward_frechet (15:47)
    2) processing-all
      - C090497A_process_data (running - 01:29)
      - process_synthetics
        - C090497A_process_synt (running - 01:29)
        - C090497A_process_dsdm00000 (running - 01:29)
        - C090497A_process_dsdm00001 (running - 01:29)
        - C090497A_process_dsdm00002 (running - 01:29)
        - C090497A_process_dsdm00003 (running - 01:29)
        - C090497A_process_dsdm00004 (running - 01:29)
        - C090497A_process_dsdm00005 (running - 01:29)
        - C090497A_process_dsdm00006 (running - 01:29)
        - C090497A_process_dsdm00007 (running - 01:29)
        - C090497A_process_dsdm00008 (running - 01:29)
        - C090497A_process_dsdm00009 (running - 01:29)
    3) mpiexec_window
    4) compute_weights
    5) compute_cgh
    6) compute_descent
    7) compute_optvals
    8) linesearch
    9) iteration_check
- B092894B
  0) iteration
    0) mpi-create-dir-B092894B (04:51)
    1) forward_frechet
      - forward (14:47)
      - frechet
        - mpiexec_xspecfem3D (running - 15:44)
        - mpiexec_xspecfem3D (14:49)
        - mpiexec_xspecfem3D (14:48)
        - mpiexec_xspecfem3D (14:49)
        - mpiexec_xspecfem3D (running - 14:52)
        - mpiexec_xspecfem3D (running - 14:51)
        - mpiexec_xspecfem3D (14:49)
        - mpiexec_xspecfem3D (14:47)
        - mpiexec_xspecfem3D (running - 15:44)
    2) processing-all
    3) mpiexec_window
    4) compute_weights
    5) compute_cgh
    6) compute_descent
    7) compute_optvals
    8) linesearch
    9) iteration_check

where

  • - indicates tasks that can be run concurrently
  • 1,2,3,.. indicate sequentially run tasks
  • tasks that are
    • done - have a (hh:mm:ss) stamp
    • running - have (running - hh:mm:ss) stamp
    • waiting to be executed have nothing
    • are a parent to running tasks have nothing

In the backend, nnodes uses a dict to keep track of submission times, execution times etc. with start and endtime attributes for each task. The attributes are simply read and printed after reading the dictionary.

What I imagine is quite similar that could be called like:

radical-log-workflow <session.id>

and output a log with

  • - for pipelines
  • 1,2,3,... for stages
  • - again for tasks

and similar timestamps.

Yes, that is indeed neat - accepted as feature request.