Parallel GEFS "beta" version To run the parallel GEFS: 1. Copy this directory and all of its contents to (your file location)/[expid]/nwdev where [expid] is a four-letter identifier. I recommend using your intials for the first two letters and your choice of letter sequence for the third and fourth. The last two letters of $expid will be used in the jobname. 2. Edit the file (your file location)/[expid]/nwdev/control/setbase to specify your account and job priority, and to refer to the location of your copy of the scripts, and to specify where you want the model output and working directories to be On cirrus or stratus, members of the "global" group can use the default locations which are based on $LOGNAME, but on vapor or for non-"global" users additional changes will have to be made. Default file locations: /global/save/$LOGNAME/s/$expid/nwdev subdirectories for jobs, scripts, ush, etc. as in /nwprod /global/save/$LOGNAME/s/$expid/nwdev/control control scripts that do not correspond to production scripts these scripts perform some of the same functions as SMS /ptmp/$LOGNAME/o/$expid/nwdev/control job cards created by control/gefsrun /ptmp/$LOGNAME/o/$expid/runlog logged output from control/gefsrun, both when run by hand and when run in each job to submit the next job /ptmp/$LOGNAME/o/$expid/com/output/dev output as in /com/output/prod /ptmp/$LOGNAME/o/$expid/com/gefs/dev GEFS output as in /com/gefs/prod /ptmp/$LOGNAME/o/$expid/com/logs jlogfile location as in /com/logs /ptmp/$LOGNAME/o/$expid/nwges/dev initial conditions as in /nwges/prod /ptmp/$LOGNAME/t/$expid/tmpnwprd working directories for jobs /global/noscrub/$LOGNAME/o/com/gfs/prod location for GFS analysis file copies 3. Modify GEFS scripts and codes as required for your experimenet 4. Modify the parallel system as required for your experiment: a. Set resolution, forecast length, and parameters to be given to forecast and other codes, plus some other options: parms/gefs.parm parms/gefs_init.parm If you change resolution or forecast length, you probably should change the resources requested by "gefsrun". b. Set input and output file locations: parms/gefs_config or else modify the job scripts jobs/* c. Set resources (time limits, nodes and tasks, etc.), in control/gefsrun d. Set limits on the number of jobs to run at once, job order, insert new jobs or disable existing ones, in control/gefsrun In control/gefsrun search for "setup" to find places to modify. For each production job, there is a section to specify the resources used in production followed by a section to modify the resource requests for development runs. In gefsrun, forecast nodes and tasks are automatically adjusted for the number of members but not for resolution or forecast length. There are two special wall_clock_limit variables in gefsrun: wall_clock_limit_fcst_long for the long forecasts and the post jobs that run at the same time wall_clock_limit_fcst_short for the 6-hour cycling forecasts and the post jobs that run at the same time The variable "during_previous_job" controls whether the next job is submitted before the current job starts its work (for jobs that run simultaneously in production) or after the current job finishes its work (when the next job runs after the current job finishes in production). If you need to keep sfcsig data that production normally deletes, disable or modify the "post cleanup" job. If you need to add archive, cleanup, graphics, or verification jobs, add them between the 900 and 999 jobs -- search for these numbers to find out where to modify gefsrun . d. add or remove output fields from the model parms/gefs_master_f00.parm parms/gefs_master_fhh.parm e. add or remove output fields from pgrba, pgrbb, and pgrb2c files parms/gefs_pgrba_f00.parm parms/gefs_pgrba_fhh.parm parms/gefs_pgrbb_f00.parm parms/gefs_pgrbb_fhh.parm parms/gefs_pgrbc_f00.parm parms/gefs_pgrbc_fhh.parm f. add or remove enspost and ensstat files parms/gefs_ensstat.parm 5. If necessary, copy initial state from production or another experiment control/gefsges yyyymmddhh [expin] where: yyyymmddhh is the date and cycle to copy expin is prod [default], para, test [all from /com] or the expid of your experiment you want to copy from 5. Start the parallel GEFS by running the script gefsrun yyyymmddhhjjj [yyyymmddhhjjj] while in the directory (your file location)/[expid]/nwdev/control first argument: date, cycle, and job number of the first job to run (first job is usually 000 if running initialization or 050 if running from given initial conditions) second argument:date, cycle, and job number of last job to run (last job is usually 999) If one argument is given, only one job will be run Each job runs this script with an additional argument in order to submit the next job. To find out which job numbers correspond to which GEFS jobs, run gefsrun joblist which produces a job number list like the following (an old example, not necessarily up to date) : job number summary job numbers depend on gefs.parm settings and on settings within this script gefsrun 000 first job of cycle 012 init separate job 013 init et job 014 init combine job jobs for this cycle 050 forecast job 052 post job for c00 053 prdgen job for c00 054 post job for p01 055 prdgen job for p01 056 post job for p02 057 prdgen job for p02 058 post job for p03 059 prdgen job for p03 060 post job for p04 061 prdgen job for p04 062 post job for p05 063 prdgen job for p05 064 post job for p06 065 prdgen job for p06 066 post job for p07 067 prdgen job for p07 068 post job for p08 069 prdgen job for p08 070 post job for p09 071 prdgen job for p09 072 post job for p10 073 prdgen job for p10 074 post job for p11 075 prdgen job for p11 076 post job for p12 077 prdgen job for p12 078 post job for p13 079 prdgen job for p13 080 post job for p14 081 prdgen job for p14 082 post job for p15 083 prdgen job for p15 084 post job for p16 085 prdgen job for p16 086 post job for p17 087 prdgen job for p17 088 post job for p18 089 prdgen job for p18 090 post job for p19 091 prdgen job for p19 092 post job for p20 093 prdgen job for p20 198 gefs gfs job 200 ensstat job 202 tracking job 204 track average job 206 post cleanup job jobs for the cycle 6 hours later 450 forecast job 452 post job for p01 453 prdgen job for p01 454 post job for p02 455 prdgen job for p02 456 post job for p03 457 prdgen job for p03 458 post job for p04 459 prdgen job for p04 460 post job for p05 461 prdgen job for p05 462 post job for p06 463 prdgen job for p06 464 post job for p07 465 prdgen job for p07 466 post job for p08 467 prdgen job for p08 468 post job for p09 469 prdgen job for p09 470 post job for p10 471 prdgen job for p10 472 post job for p11 473 prdgen job for p11 474 post job for p12 475 prdgen job for p12 476 post job for p13 477 prdgen job for p13 478 post job for p14 479 prdgen job for p14 480 post job for p15 481 prdgen job for p15 482 post job for p16 483 prdgen job for p16 484 post job for p17 485 prdgen job for p17 486 post job for p18 487 prdgen job for p18 488 post job for p19 489 prdgen job for p19 490 post job for p20 491 prdgen job for p20 596 tracking job 598 post cleanup job jobs for the cycle 12 hours later 600 forecast job for the cycle 12 hours later 602 to 641 alternating post and prdgen jobs 602 post job for p01 603 prdgen job for p01 604 post job for p02 605 prdgen job for p02 606 post job for p03 607 prdgen job for p03 608 post job for p04 609 prdgen job for p04 610 post job for p05 611 prdgen job for p05 612 post job for p06 613 prdgen job for p06 614 post job for p07 615 prdgen job for p07 616 post job for p08 617 prdgen job for p08 618 post job for p09 619 prdgen job for p09 620 post job for p10 621 prdgen job for p10 622 post job for p11 623 prdgen job for p11 624 post job for p12 625 prdgen job for p12 626 post job for p13 627 prdgen job for p13 628 post job for p14 629 prdgen job for p14 630 post job for p15 631 prdgen job for p15 632 post job for p16 633 prdgen job for p16 634 post job for p17 635 prdgen job for p17 636 post job for p18 637 prdgen job for p18 638 post job for p19 639 prdgen job for p19 640 post job for p20 641 prdgen job for p20 746 tracking job 748 post cleanup job jobs for the cycle 18 hours later 750 forecast job for the cycle 18 hours later 752 post job for p01 753 prdgen job for p01 754 post job for p02 755 prdgen job for p02 756 post job for p03 757 prdgen job for p03 758 post job for p04 759 prdgen job for p04 760 post job for p05 761 prdgen job for p05 762 post job for p06 763 prdgen job for p06 764 post job for p07 765 prdgen job for p07 766 post job for p08 767 prdgen job for p08 768 post job for p09 769 prdgen job for p09 770 post job for p10 771 prdgen job for p10 772 post job for p11 773 prdgen job for p11 774 post job for p12 775 prdgen job for p12 776 post job for p13 777 prdgen job for p13 778 post job for p14 779 prdgen job for p14 780 post job for p15 781 prdgen job for p15 782 post job for p16 783 prdgen job for p16 784 post job for p17 785 prdgen job for p17 786 post job for p18 787 prdgen job for p18 788 post job for p19 789 prdgen job for p19 790 post job for p20 791 prdgen job for p20 896 tracking job 898 post cleanup job 900 betweeen production jobs and archive/cleanup jobs 999 last job of cycle Loadleveler jobnames are in the form: idyyyymmddhhjjj.nnnnn where id is the last two letters of your $expid and nnnnn is the job number assigned by loadleveler 6. Run the script gefsmon while in the directory (your file location)/[expid]/nwdev/control to follow the progress of your jobs. If you have more jobs running than the limit you have set, or if there is not enough space on /ptmp to continue the GEFS, or if the data needed by the init_separate job (012) is not available, a "wait" job will be submitted instead, which will try again in miwait minutes (currently 4). If /ptmp actually fills up, this will probably fail and you will have to restart the run by hand when more disk is available. If any job experiences a failure that can be detected in the output file by the script /global/save/wx20rw/h/bin/hpss.put.output.day the file waitfile.s (for stratus) or waitfile.c (for cirrus) will be set to yes, and any later job will be replaced by a "wait" job until the problem is fixed and the output file is removed from the search (usually by renaming the output directory from YYYYMMDD to YYYYMMDD.suffix so it will not be detected. These scripts called by gefsmon can also be used on their own: llqww - lists your jobs with more detail than "llq" llqwa - summarizes all loadleveler jobs by class and status 7. If desired -- after running all the jobs in a cycle, or a day, the job card/script files in /ptmp/$LOGNAME/o/expid/nwdev/control/YYYYMMDD/CC can be modified for use as job files by SMS. 8. Send questions or feedback to richard.wobus@noaa.gov