Midway GPFS disk I/O improvements
Closed this issue · 2 comments
pdeperio commented
Our jobs are quite I/O intensive according to RCC's monitoring log here
/project/rcc/ivy2/IOlogs2/gpfs.log
where we're reaching upwards of ~2 GB/s total (where ~10 GB/s across the whole system is reaching the limit).
Parsing script from Igor: /project/rcc/ivy2/IOlogs2/parse.py
to try to correlate with our job count here: /home/tunnell/161212-job_monitoring/running_jobs.log
And some ideas to try after we're in a lower rate data taking phase (i.e. not right now during calibration):
- Try running on
broadwl
partition (infiniband) - Submit jobs from Midway2 (since apparently we're using those nodes already)
a. Switching compilation and output writing to /scratch2/midway
b. Writing to /project2 instead - Removing compilation per job (related to XENON1T/pax#463)
- Relax repetitive checksumming
- Throttle number of simultaneous transfers
pdeperio commented
Some documentation about /project/rcc/ivy2/IOlogs2/gpfs.log
:
gpfs_cap1 - /project
gpfs2_cap - /project2
gpfs_perf - /scratch/midway
gpfs2_perf - /scratch/midway2
gpfs_perf2 - $HOME and /software for both Midway1 and Midway2
gpfs2_perf2 - none
pdeperio commented
Various changes seem to have settled things down.