Tuning an application's I/O and tuning for compute or memory share similar characteristics, but I/O tuning introduces some unique challenges. We will examine a typical high performance computing (HPC) system and its storage. Through benchmarks, I/O kernels, and characterization tools we will build up the kind of mental model of a system's I/O performance that can help us find solutions to I/O performance problems when we encounter them.
We've given a high-level survey of parallel I/O at SC for many years. We've also had a chance to go into a little more detail at the ATPESC training program's "Data and I/O" day. In this half day workshop, however, we are going to have a chance to go into more detail about tuning parameters and workloads. We will also cover some of the tools and resources we can use to understand why we see the performance we do.
We are running the workshop at Argonne National Laboratory. Building 241, room D-172.
Monday 20 March from Noon to 3:15 pm
Noon to 3:30 pm (All times US Central)
- Introduction (Video) (Slides)
- Challenges
- terminology and technologies
- File system (Video) (Slides)
- Walk through storage and networking features of exemplar system
- Benchmarking (Video) (Slides)
- challenges of I/O benchmarking
- the IOR benchmark
- Lustre + striping: Stripe-width Exercise
- Characterizing I/O with Darshan (Video) (Slides)
- Break
- Analysis of I/O with Darshan (Video) (Slides)
- MPI-IO: the core of the simulation I/O stack (Video) (Slides)
- Experimenting with noncongituous-I/O Exercise
- I/O libraries
- Machine Learning I/O (Video) (Slides)
- Describe a non-trivial I/O kernel
- Execute on exemplar machine
- DLIO Demonstration
- IOR: We can generate and observe lots of access patterns with the IOR benchmark
- Darshan: The Darshan I/O characterization tool will give us initial reports with options to drill into access patterns and create our own queries.
- ROMIO: We talk a lot about Cray MPICH tuning parameters. Cray has done a lot of work on top of ROMIO. Some of the options discussed won't apply to non-Cray systems, but a lot of the important ideas like data sieving and two-phase collective buffering are found in ROMIO and in turn in MPICH or OpenMPI. do.
- Parallel-NetCDF and HDF5: High level I/O libraries targeting applications.
- The full Slide presentation
- Video recording of the talk.
The lecture and supporting experiments ran on the Polaris machine at Argonne's ALCF. The goal, however, is for people to try out these experiments on their own facilities.
Each experiment in the examples
directory has a README and some scripts for
generating plots. The experiments also have a platform-specific directory
where we store things like job submission scripts and results. Contribute pull
requests for your own platform.
If you have questions or just want to chat about topics, start up a github discussion. It is my hope that while the presentation was given in 2023, the topics and approaches will live on.
This work was supported by the Better Scientific Software Fellowship Program, funded by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy (DOE) Office of Science and the National Nuclear Security Administration; and by the National Science Foundation (NSF) under Grant No. 2154495.
Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the DOE or NSF.