Formatted qstat output to show relevant job properties (such as CPUs, elapsed runtime, remaining runtime etc.) for efficient HPC job management.
The current implementation can be used with
UCL(-managed) HPC systems (Myriad, Grace, Kathleen,
Thomas), via qstat_enhanced
, or with Imperial
College's HPC systems (CX1/CX2), via
qstat_with_cpus_elaptime
.
Could likely be very simply modified to work with other HPC systems too.
$ q
Job ID Priority Job Name Status CPUs State Elap_Time Remaining
-------- ---------- -------------------------- -------- ------ ------- ------------- -----------
42656 3.50000 CdonTe222Antisitea0.345 running 240 r 6.35 hours 41.65 hours
42633 0.00000 CdonTe222Antisitea0.345 pending 240 hqw Fuck all pal 24.00 hours
$ q
Job ID Class Job Name Status CPUs RTime Comment
-------- ----------- -------------------- -------- ------ --------- -----------
1271971 Large 222kCdVacfrmibrion2 Running 576 07:36:55 finishing on Mon Apr 06 at 03:51
1274343 Capability 222kCdTeSinglePoint Queued 2016 Fuck all starting by Sat Apr 4 17:05
git clone https://github.com/kavanase/qstat_enhanced.git
# For UCL HPC:
pip install xmltodict # qstat_enhanced dependency
echo "alias q=/path/to/qstat_enhanced" >> ~/.bashrc
# or for ICL HPC:
echo "alias q=/path/to/qstat_with_cpus_elaptime" >> ~/.bashrc
Thankfully, qstat
job history is maintained on all UCL HPC systems, via the
jobhist
command. I set the alias jh
in my bashrc
file:
alias jh="jobhist --info='fstime,fetime,job_number,job_name,slots'"
as the standard output of jobhist
gives a lot of irrelevant info
(such as HOSTNAME
(internal name for node the job ran on),
OWNER
(your username) etc.), and doesn't tell you how many CPUs the jobs used.
As of February 2020, job history is no longer maintained by qstat
(previously accessible with qstat -x
).
As an alternative, in order to keep track of my jobs, I add these lines of code
to my PBS jobscripts:
(This is possibly a bit 'extra', but not having a job history is pretty shit, so it is what it is)
start=$(date +%s)
mpiexec my_program
cpus=$( find . -maxdepth 1 -name "vasp_out*" | xargs ls -t | head -1 | xargs head | awk '/ranks allocated/{print $3}' )
runtime_sec=$(($(date +%s) - start))
runtime=$( date -d@$runtime_sec -u +%H:%M:%S )
printf "%-13s %-13s %-4s %-9s %-50s \n" "$( echo ${PBS_JOBID} | cut -c -7)" \
"$( date "+%a %H:%M" )" "$cpus" "$runtime" "$( echo ${PBS_O_WORKDIR} | cut -c 43- )" >> ~/job_log
and this to my .bashrc
, to give me an update on any completed jobs:
echo " Job Num Finish Time CPUs Runtime Working Directory"
echo "--------- ----------------- ---- --------- --------------------------------------------------"
tail -5 ~/job_log
Several of the functions in qstat_enhanced
are based on the very useful qstat repository by relleums.
This program is not affiliated with UCL or Imperial College London. This program is made available under the MIT License; you are free to modify and use the code, but do so at your own risk.