/GEFS

Primary LanguageShell

Parallel GEFS "beta" version



To run the parallel GEFS:



1.  Copy this directory and all of its contents to

(your file location)/[expid]/nwdev

where [expid] is a four-letter identifier.  I recommend using your intials
for the first two letters and your choice of letter sequence for the third
and fourth.  The last two letters of $expid will be used in the jobname.



2.  Edit the file

(your file location)/[expid]/nwdev/control/setbase

to specify your account and job priority, and
to refer to the location of your copy of the scripts, and 
to specify where you want the model output and working directories to be 

On cirrus or stratus, members of the "global" group can use the default
locations which are based on $LOGNAME, but on vapor or for non-"global" 
users additional changes will have to be made. 

Default file locations:

/global/save/$LOGNAME/s/$expid/nwdev
   subdirectories for jobs, scripts, ush, etc. as in /nwprod

/global/save/$LOGNAME/s/$expid/nwdev/control
   control scripts that do not correspond to production scripts  
   these scripts perform some of the same functions as SMS

/ptmp/$LOGNAME/o/$expid/nwdev/control    
   job cards created by control/gefsrun

/ptmp/$LOGNAME/o/$expid/runlog
   logged output from control/gefsrun, both when run
   by hand and when run in each job to submit the
   next job

/ptmp/$LOGNAME/o/$expid/com/output/dev    
   output as in /com/output/prod

/ptmp/$LOGNAME/o/$expid/com/gefs/dev    
   GEFS output as in /com/gefs/prod

/ptmp/$LOGNAME/o/$expid/com/logs    
   jlogfile location as in /com/logs

/ptmp/$LOGNAME/o/$expid/nwges/dev    
   initial conditions as in /nwges/prod

/ptmp/$LOGNAME/t/$expid/tmpnwprd
   working directories for jobs

/global/noscrub/$LOGNAME/o/com/gfs/prod
   location for GFS analysis file copies



3.   Modify GEFS scripts and codes as required for your experimenet



4.   Modify the parallel system as required for your experiment:

  a.  Set resolution, forecast length, and parameters to be given to forecast
      and other codes, plus some other options:

parms/gefs.parm
parms/gefs_init.parm

If you change resolution or forecast length, you probably
should change the resources requested by "gefsrun".
  b.  Set input and output file locations:

parms/gefs_config
or else modify the job scripts jobs/*

  c.  Set resources (time limits, nodes and tasks, etc.),
      in control/gefsrun


  d.  Set limits on the number of jobs to run at once,
      job order, insert new jobs or disable existing ones, 
      in control/gefsrun

In control/gefsrun search for "setup" to find places to modify.

For each production job, there is a section to specify the 
resources used in production followed by a section to modify 
the resource requests for development runs.

In gefsrun, forecast nodes and tasks are automatically adjusted 
for the number of members but not for resolution or forecast 
length.

There are two special wall_clock_limit variables in gefsrun:
wall_clock_limit_fcst_long for the long forecasts and the post jobs
  that run at the same time
wall_clock_limit_fcst_short for the 6-hour cycling forecasts and
  the post jobs that run at the same time

The variable "during_previous_job" controls whether the next 
job is submitted before the current job starts its work (for 
jobs that run simultaneously in production) or after the 
current job finishes its work (when the next job runs after 
the current job finishes in production).

If you need to keep sfcsig data that production normally
deletes, disable or modify the "post cleanup" job.

If you need to add archive, cleanup, graphics, or verification
jobs, add them between the 900 and 999 jobs -- search for these
numbers to find out where to modify gefsrun .

  d.  add or remove output fields from the model

parms/gefs_master_f00.parm
parms/gefs_master_fhh.parm

  e.  add or remove output fields from pgrba, pgrbb, and pgrb2c files

parms/gefs_pgrba_f00.parm
parms/gefs_pgrba_fhh.parm
parms/gefs_pgrbb_f00.parm
parms/gefs_pgrbb_fhh.parm
parms/gefs_pgrbc_f00.parm
parms/gefs_pgrbc_fhh.parm

  f.  add or remove enspost and ensstat files

parms/gefs_ensstat.parm



5.   If necessary, copy initial state from production or another experiment

control/gefsges yyyymmddhh [expin]

where:
yyyymmddhh is the date and cycle to copy
expin      is prod [default], para, test [all from /com]
	   or the expid of your experiment you want to copy from

     
5.   Start the parallel GEFS by running the script

gefsrun yyyymmddhhjjj [yyyymmddhhjjj]

while in the directory
(your file location)/[expid]/nwdev/control

first argument: date, cycle, and job number of the first job to run
(first job is usually 000 if running initialization 
		   or 050 if running from given initial conditions)

second argument:date, cycle, and job number of last job to run
(last job is usually 999)

If one argument is given, only one job will be run

Each job runs this script with an additional argument in order to submit 
the next job.

To find out which job numbers correspond to which GEFS jobs,
run

gefsrun joblist

which produces a job number list like the following (an old example,
not necessarily up to date) :


    job number summary

job numbers depend on gefs.parm settings
and on settings within this script gefsrun

000 first job of cycle

012 init separate job
013 init et job
014 init combine job

    jobs for this cycle

050 forecast job

052 post job for c00
053 prdgen job for c00
054 post job for p01
055 prdgen job for p01
056 post job for p02
057 prdgen job for p02
058 post job for p03
059 prdgen job for p03
060 post job for p04
061 prdgen job for p04
062 post job for p05
063 prdgen job for p05
064 post job for p06
065 prdgen job for p06
066 post job for p07
067 prdgen job for p07
068 post job for p08
069 prdgen job for p08
070 post job for p09
071 prdgen job for p09
072 post job for p10
073 prdgen job for p10
074 post job for p11
075 prdgen job for p11
076 post job for p12
077 prdgen job for p12
078 post job for p13
079 prdgen job for p13
080 post job for p14
081 prdgen job for p14
082 post job for p15
083 prdgen job for p15
084 post job for p16
085 prdgen job for p16
086 post job for p17
087 prdgen job for p17
088 post job for p18
089 prdgen job for p18
090 post job for p19
091 prdgen job for p19
092 post job for p20
093 prdgen job for p20

198 gefs gfs job
200 ensstat job
202 tracking job
204 track average job
206 post cleanup job

    jobs for the cycle 6 hours later

450 forecast job

452 post job for p01
453 prdgen job for p01
454 post job for p02
455 prdgen job for p02
456 post job for p03
457 prdgen job for p03
458 post job for p04
459 prdgen job for p04
460 post job for p05
461 prdgen job for p05
462 post job for p06
463 prdgen job for p06
464 post job for p07
465 prdgen job for p07
466 post job for p08
467 prdgen job for p08
468 post job for p09
469 prdgen job for p09
470 post job for p10
471 prdgen job for p10
472 post job for p11
473 prdgen job for p11
474 post job for p12
475 prdgen job for p12
476 post job for p13
477 prdgen job for p13
478 post job for p14
479 prdgen job for p14
480 post job for p15
481 prdgen job for p15
482 post job for p16
483 prdgen job for p16
484 post job for p17
485 prdgen job for p17
486 post job for p18
487 prdgen job for p18
488 post job for p19
489 prdgen job for p19
490 post job for p20
491 prdgen job for p20

596 tracking job
598 post cleanup job

    jobs for the cycle 12 hours later

600 forecast job for the cycle 12 hours later
602 to 641 alternating post and prdgen jobs

602 post job for p01
603 prdgen job for p01
604 post job for p02
605 prdgen job for p02
606 post job for p03
607 prdgen job for p03
608 post job for p04
609 prdgen job for p04
610 post job for p05
611 prdgen job for p05
612 post job for p06
613 prdgen job for p06
614 post job for p07
615 prdgen job for p07
616 post job for p08
617 prdgen job for p08
618 post job for p09
619 prdgen job for p09
620 post job for p10
621 prdgen job for p10
622 post job for p11
623 prdgen job for p11
624 post job for p12
625 prdgen job for p12
626 post job for p13
627 prdgen job for p13
628 post job for p14
629 prdgen job for p14
630 post job for p15
631 prdgen job for p15
632 post job for p16
633 prdgen job for p16
634 post job for p17
635 prdgen job for p17
636 post job for p18
637 prdgen job for p18
638 post job for p19
639 prdgen job for p19
640 post job for p20
641 prdgen job for p20

746 tracking job
748 post cleanup job

    jobs for the cycle 18 hours later

750 forecast job for the cycle 18 hours later

752 post job for p01
753 prdgen job for p01
754 post job for p02
755 prdgen job for p02
756 post job for p03
757 prdgen job for p03
758 post job for p04
759 prdgen job for p04
760 post job for p05
761 prdgen job for p05
762 post job for p06
763 prdgen job for p06
764 post job for p07
765 prdgen job for p07
766 post job for p08
767 prdgen job for p08
768 post job for p09
769 prdgen job for p09
770 post job for p10
771 prdgen job for p10
772 post job for p11
773 prdgen job for p11
774 post job for p12
775 prdgen job for p12
776 post job for p13
777 prdgen job for p13
778 post job for p14
779 prdgen job for p14
780 post job for p15
781 prdgen job for p15
782 post job for p16
783 prdgen job for p16
784 post job for p17
785 prdgen job for p17
786 post job for p18
787 prdgen job for p18
788 post job for p19
789 prdgen job for p19
790 post job for p20
791 prdgen job for p20

896 tracking job
898 post cleanup job

900 betweeen production jobs and archive/cleanup jobs
999 last job of cycle


Loadleveler jobnames are in the form:
idyyyymmddhhjjj.nnnnn
where id is the last two letters of your $expid
and nnnnn is the job number assigned by loadleveler



6.   Run the script

gefsmon

while in the directory
(your file location)/[expid]/nwdev/control

to follow the progress of your jobs. 

If you have more jobs running than the limit you have set, or
if there is not enough space on /ptmp to continue the GEFS, or
if the data needed by the init_separate job (012) is not 
available, a "wait" job will be submitted instead, which will
try again in miwait minutes (currently 4).  If /ptmp actually 
fills up, this will probably fail and you will have to restart 
the run by hand when more disk is available.

If any job experiences a failure that can be detected in the
output file by the script 
/global/save/wx20rw/h/bin/hpss.put.output.day
the file waitfile.s (for stratus) or waitfile.c (for cirrus)
will be set to yes, and any later job will be replaced by
a "wait" job until the problem is fixed and the output file
is removed from the search (usually by renaming the output
directory from YYYYMMDD to YYYYMMDD.suffix so it will not
be detected.


These scripts called by gefsmon can also be used on their own:
llqww - lists your jobs with more detail than "llq"
llqwa - summarizes all loadleveler jobs by class and status


7.  If desired -- after running all the jobs in a cycle, or
a day, the job card/script files in
/ptmp/$LOGNAME/o/expid/nwdev/control/YYYYMMDD/CC
can be modified for use as job files by SMS.


8.  Send questions or feedback to

richard.wobus@noaa.gov