XENON1T/cax

Midway GPFS disk I/O improvements

Closed this issue · 2 comments

Our jobs are quite I/O intensive according to RCC's monitoring log here

/project/rcc/ivy2/IOlogs2/gpfs.log

where we're reaching upwards of ~2 GB/s total (where ~10 GB/s across the whole system is reaching the limit).

Parsing script from Igor: /project/rcc/ivy2/IOlogs2/parse.py to try to correlate with our job count here: /home/tunnell/161212-job_monitoring/running_jobs.log

And some ideas to try after we're in a lower rate data taking phase (i.e. not right now during calibration):

  1. Try running on broadwl partition (infiniband)
  2. Submit jobs from Midway2 (since apparently we're using those nodes already)
    a. Switching compilation and output writing to /scratch2/midway
    b. Writing to /project2 instead
  3. Removing compilation per job (related to XENON1T/pax#463)
  4. Relax repetitive checksumming
  5. Throttle number of simultaneous transfers

Some documentation about /project/rcc/ivy2/IOlogs2/gpfs.log:

gpfs_cap1 - /project
gpfs2_cap - /project2

gpfs_perf - /scratch/midway
gpfs2_perf - /scratch/midway2

gpfs_perf2 - $HOME and /software for both Midway1 and Midway2
gpfs2_perf2 - none

Various changes seem to have settled things down.