/awesome-high-performance-computing

A curated list of awesome high performance computing resources

A curated list of awesome high performance computing resources.

Table of Contents

General Info

A Few Upcoming Supercomputers

Most Recent List of the Top500 Supercomputers

History

Trends

Software

Popular HPC Programming Libraries/APIs/Tools/Standards/Simulators

  • alpaka - The alpaka library is a header-only C++17 abstraction library for accelerator development
  • async-rdma - A framework for writing RDMA applications with high-level abstraction and asynchronous APIs
  • CAF - An Open Source Implementation of the Actor Model in C++
  • Chapel - A Programming Language for Productive Parallel Computing on Large-scale Systems
  • Charm++ - Parallel Programming with Migratable Objects
  • Cilk Plus - C/C++ Extension for Data and Task Parallelism
  • Codon - high-performance Python compiler that compiles Python code to native machine code without any runtime overhead
  • CUDA - High performance NVIDIA GPU acceleration
  • dask - Dask provides advanced parallelism for analytics, enabling performance at scale for the tools you love
  • DeepSpeed - An easy-to-use deep learning optimization software suite that enables unprecedented scale and speed for Deep Learning Training and Inference
  • DeterminedAI - Distributed deep learning
  • FastFlow - High-performance Parallel Patterns in C++
  • Galois - A C++ Library to Ease Parallel Programming with Irregular Parallelism
  • Halide - A language for fast, portable computation on images and tensors
  • Heteroflow - Concurrent CPU-GPU Task Programming using Modern C++
  • highway - Performance portable SIMD intrinsics
  • HIP - HIP is a C++ Runtime API and Kernel Language for AMD/Nvidia GPU
  • HPC-X - Nvidia implementation of MPI
  • HPX - A C++ Standard Library for Concurrency and Parallelism
  • Horovod - Distributed deep learning training framework for TensorFlow, Keras, PyTorch, and Apache MXNet
  • ISPC - An open-source compiler for high-performance SIMD programming on the CPU and GPU
  • Intel ISPC - SPMD compiler
  • Intel TBB - Threading Building Blocks
  • joblib - Data-flow programming for performance (python)
  • Kompute - The general purpose GPU compute framework for cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends)
  • Kokkos - A C++ Programming Model for Writing Performance Portable Applications on HPC platforms
  • Kubeflow MPI Operator - MPI Operator for Kubeflow
  • Legate - Nvidia replacement for numpy based on Legion
  • Legion - Distributed heterogeneous programming library
  • MAGMA - Next generation linear algebra (LA) GPU accelerated libraries
  • Merlin - A distributed task queuing system, designed to allow complex HPC workflows to scale to large numbers of simulations
  • Metal - Apple's GPU API
  • Microsoft MPI - Microsoft's implementation of MPI
  • MOGSLib - User defined schedulers
  • mpi4jax - Zero-copy mpi for jax arrays
  • mpi4py - Python bindings for MPI
  • MPI - OpenMPI implementation of the Message passing interface
  • MPI - MPICH implementation of the Message passing interface
  • MPI Standardization Forum - Forum for MPI standardization
  • MPAVICH - Implementation of MPI
  • NCCL - The NVIDIA Collective Communication Library for multi-GPU and multi-node communication
  • cuNumeric - GPU drop-in for numpy
  • stdpar - GPU accelerated C++ from NVIDIA
  • numba - A JIT compiler that translates a subset of Python into fast machine code
  • oneAPI - A unified, multiarchitecture, multi-vendor programming model
  • OpenACC - "OpenMP for GPUs"
  • OpenCilk - MIT continuation of Cilk Plus
  • OpenMP - Multi-platform Shared-memory Parallel Programming in C/C++ and Fortran
  • PVM - Parallel Virtual Machine: A predecessor to MPI for distributed computing
  • PMIX - Standard for process management
  • Pollux - Message Passing Cloud orchestrator
  • Pyfi - Distributed flow and computation system
  • RAJA - Architecture and programming model portability for HPC applications
  • RaftLib - A C++ Library for Enabling Stream and Dataflow Parallel Computation
  • ray - Scale AI and Python workloads from reinforcement learning to deep learning
  • ROCM - First open-source software development platform for HPC/Hyperscale-class GPU computing
  • RS MPI - Rust bindings for MPI
  • Scalix - Data parallel computing framework
  • Simgrid - Simulate cluster/HPC environments
  • SkelCL - A Skeleton Library for Heterogeneous Systems
  • STAPL - Standard Template Adaptive Parallel Programming Library in C++
  • STLab - High-level Constructs for Implementing Multicore Algorithms with Minimized Contention
  • SYCL - C++ Abstraction layer for heterogeneous devices
  • Taichi - Parallel programming language for high-performance numerical computations in Python
  • Taskflow - A Modern C++ Parallel Task Programming Library
  • The Open Community Runtime - Specification for Asynchronous Many Task systems
  • Transwarp - A Header-only C++ Library for Task Concurrency
  • Tuplex - Blazing fast python data science
  • UCX - Optimized production proven-communication framework
  • Zluda - Run unmodified CUDA applications with near-native performance on Intel AMD GPUs.
  • HyperQueue - HyperQueue is a tool designed to simplify execution of large workflows (task graphs) on HPC clusters.

Cluster Hardware Discovery Tools

  • cpuid - A software instruction available on Intel, AMD, and other processors that can be used to determine processor type and features.
  • cpuid instruction note - A detailed note on the CPUID instruction used for processor identification.
  • cpufetch - A simple yet fancy CPU architecture fetching tool.
  • gpufetch - A tool similar to cpufetch, but for fetching GPU architecture.
  • intel cpuinfo - Intel tool providing information about the characteristics of Intel CPUs.
  • Likwid - Provides all information about the supercomputer/cluster.
  • LIKWID.jl - Julia wrapper for LIKWID.
  • openmpi hwloc - Portable Hardware Locality (hwloc) software project.
  • PRK - Parallel Research Kernels - A collection of kernels for parallel programming research.

Cluster Management/Tools/Schedulers/Stacks

  • BeeGFS - A parallel file system designed for performance-critical environments.
  • Bluebanquise - An open-source cluster management tool.
  • Bright Cluster Manager - Software for deploying and managing HPC and AI server clusters.
  • Ceph - An open-source distributed storage system.
  • DeepOps - Nvidia's GPU infrastructure and automation tools for Kubernetes and Slurm clusters.
  • E4S - The Extreme Scale HPC Scientific Stack - A collection of open-source software packages for HPC environments.
  • Easybuild - A package manager for HPC/supercomputers.
  • EESSI - A shared stack of scientific software installations.
  • Flux framework - A framework for high-performance computing clusters.
  • fpsync - A tool for fast parallel data transfer using fpart and rsync.
  • GPFS - A high-performance parallel file system developed by IBM.
  • Guix - A package manager for HPC/supercomputers.
  • Intel DAOS - A software-defined scale-out object store for HPC applications.
  • LSF - A batch system for HPC and distributed computing environments.
  • Lmod - A Lua-based module system for software environment management on HPC systems.
  • Lustre Parallel File System - A high-performance distributed filesystem for large-scale cluster computing.
  • moosefs - A fault-tolerant, highly available, distributed file system.
  • NetApp - Intelligent data infrastructure for various workloads.
  • OpenHPC - A community-led set of HPC components.
  • OpenOnDemand - A web portal for accessing supercomputing resources.
  • OpenPBS - A software for workload management and job scheduling.
  • OpenXdMod - A tool for managing high-performance computing resources.
  • RADIUSS - Rapid Application Development via an Institutional Universal Software Stack.
  • rocks - An open-source Linux cluster distribution.
  • Ruse - A tool for managing software environments in HPC clusters.
  • SGE - A resource management software for large clusters of computers.
  • Slurm - A cluster management and job scheduling system for Linux clusters.
  • Spack - A package manager for HPC/supercomputers.
  • sstack - A tool to install multiple software stacks such as Spack, EasyBuild, and Conda.
  • Starfish - Unstructured data management and metadata solution for files and objects.
  • Warewulf - An operating system provisioning system and cluster management tool.
  • xCat - A distributed computing management and provisioning tool.
  • XDMoD - An open-source tool for managing high-performance computing resources.
  • Globus Connect - A fast data transfer tool between supercomputers.

HPC-specific Operating Systems

  • Kitten - A lightweight kernel designed for high-performance computing. It focuses on providing low noise and predictable performance for HPC applications.
  • McKernel - A hybrid kernel that combines Linux and a lightweight kernel designed to provide high performance for HPC applications.
  • mOS - A specialized operating system for high-performance computing, designed to support large-scale, manycore processors.

Development/Workflow/Monitoring Tools for HPC

  • Apache Airflow - A platform to programmatically author, schedule, and monitor workflows.
  • Apptainer (formerly Singularity) - Container platform designed for scientific and high-performance computing (HPC) environments.
  • arbiter2 - Monitors and protects interactive nodes with cgroups.
  • Charliecloud - Lightweight container solution for high-performance computing (HPC).
  • Docker - A set of platform as a service products that use OS-level virtualization to deliver software in packages called containers.
  • genv - GPU Environment Management for managing and scheduling GPU resources.
  • Grafana - Open-source platform for monitoring and observability, visualizing metrics.
  • grpc - A high-performance, open-source universal RPC framework.
  • HPC Rocket - Allows submitting Slurm jobs in Continuous Integration (CI) pipelines.
  • HTCondor - An open-source high-throughput computing software framework.
  • Jacamar-ci - CI/CD tool designed for HPC and scientific computing workflows.
  • Kubernetes - An open-source system for automating deployment, scaling, and management of containerized applications.
  • nextflow - A workflow framework to deploy data-driven computational pipelines.
  • perun - Energy monitor for HPC systems, focusing on performance and energy efficiency.
  • Prefect - A workflow management system, designed for modern infrastructure and powered by the open-source Prefect Core workflow engine.
  • Prometheus - An open-source monitoring system with a dimensional data model, flexible query language, efficient time series database and modern alerting approach.
  • redun - Workflow engine that emphasizes simplicity, reliability, and scalability.
  • remora - Tool for monitoring and reporting the performance of batch jobs on HPC systems.
  • ruptime - A utility for monitoring the status of computational jobs and systems.
  • Slurmvision slurm dashboard - A dashboard for monitoring and managing Slurm jobs.
  • slurm docker cluster - A Slurm cluster implemented using Docker containers, for development and testing.
  • snakemake - A workflow management system that reduces the complexity of creating reproducible and scalable data analyses.
  • Stui slurm dashboard for the terminal - A terminal-based UI for managing and monitoring Slurm clusters.
  • Vaex - A Python library for lazy Out-of-Core DataFrames (similar to Pandas), to visualize and explore big tabular datasets.

Debugging Tools for HPC

  • ddt - A powerful debugger designed for developers to solve complex problems on multi-threaded and multi-process environments in HPC.
  • marmot MPI checker - A tool for detecting and reporting issues in MPI (Message Passing Interface) applications.
  • python debugging tools - A collection of tools for debugging Python applications, including pdb and other utilities.
  • seer modern gui for gdb - A graphical user interface for GDB, aiming to improve the debugging experience with modern features and visuals.
  • Summary of C/C++ debugging tools - An overview of various debugging tools available for C/C++ applications, focusing on HPC environments.
  • totalview - A comprehensive source code analysis and debugging tool designed for complex software running on HPC systems, supporting a wide range of languages and architectures.

Performance/Benchmark Tools for HPC

  • demonspawn - A framework for automated execution of benchmarks and simulations, designed for HPC environments.
  • Google benchmark - A microbenchmark support library for C++ that tracks performance over time.
  • HPL benchmark - The High Performance Linpack Benchmark for measuring floating-point computing power of systems.
  • kerncraft - A tool for analytical modeling of loop performance and cache behavior on HPC systems.
  • NASA parallel benchmark suite - A set of benchmarks designed to evaluate the performance of parallel supercomputers.
  • papi - Provides standard APIs for accessing hardware performance counters available on modern microprocessors.
  • scalasca - A software tool that supports performance analysis of large-scale parallel applications.
  • scalene - A high-performance, high-precision CPU, GPU, and memory profiler for Python.
  • Summary of code performance analysis tools - An overview of tools for analyzing HPC application performance.
  • Summary of profiling tools - A comprehensive list of profiling tools for performance analysis in HPC.
  • tau - TAU (Tuning and Analysis Utilities) is a profiling and tracing toolkit for performance analysis of parallel programs.
  • The Bandwidth Benchmark - A tool for measuring memory bandwidth across various CPUs and systems.
  • vampir - A tool for detailed analysis of MPI program executions by visualizing their event traces.
  • bytehound memory profiler - A detailed memory profiler for tracking down memory issues and leaks.
  • Flamegraphs - Visualization tool for profiling software, allowing quick identification of performance bottlenecks.
  • fio - Flexible I/O tester for benchmarking and stress/hardware verification.
  • IBM Spectrum Scale Key Performance Indicators (KPI) - Provides key performance indicators for IBM Spectrum Scale, aiding in performance tuning and monitoring.
  • Ior - A parallel file system I/O benchmarking tool used widely in HPC for testing storage systems.
  • ngstress - A versatile tool for stressing various subsystems of a computer to find hardware faults or to benchmark performance.
  • Hotspot - The Linux perf GUI for in-depth performance analysis and visualization of software behavior.
  • mixbench - A benchmark suite designed to evaluate CPUs and GPUs across different compute and memory operations.
  • pmu-tools (toplev) - Performance monitoring tools for modern Intel CPUs, offering detailed insights into hardware and application performance.
  • SPEC CPU Benchmark - A benchmark suite designed to provide a comparative measure of compute-intensive performance across the widest practical range of hardware.
  • STREAM Memory Bandwidth Benchmark - Measures sustainable memory bandwidth and the corresponding computation rate for simple vector kernels.
  • Intel MPI benchmarks - A set of benchmarks designed to measure the performance and scalability of MPI implementations on Intel architectures.
  • Ohio state MPI benchmarks - A comprehensive suite of benchmarks for evaluating MPI performance across a variety of message passing patterns and communication protocols.
  • hpctoolkit - An integrated suite of tools for measurement and analysis of program performance on computers ranging from desktops to supercomputers.
  • core-to-core-latency - A diagnostic tool designed to measure and report the latency between CPU cores, aiding in the optimization of parallel computing tasks.
  • speedscope - An interactive, web-based viewer for performance profiles of software. It supports various formats and provides a flamegraph visualization to identify hot paths efficiently.
  • Differential Flamegraphs - A visualization technique developed by Brendan Gregg that highlights differences between performance profiles, making it easier to spot performance regressions or improvements.
  • Hyperfine - A command-line benchmarking tool that provides a simple and user-friendly means to compare the performance of commands, featuring statistical analysis across multiple runs.
  • Openfoam HPC benchmark - A benchmarking suite for evaluating the High Performance Computing capabilities of OpenFOAM, an open-source CFD software, under various computational loads.
  • OSU microbenchmarks - A collection of microbenchmarks designed to evaluate the performance of MPI implementations across various communication protocols and message sizes.
  • fio flexible I/O tester - A versatile tool for I/O workload simulation and benchmarking, capable of testing a wide array of storage and filesystem configurations.
  • vftrace - A tracing tool specifically designed for the NEC SX-Aurora TSUBASA Vector Engine, enabling detailed performance analysis of vectorized code.
  • tinymembench - A simple memory benchmark tool, focusing on benchmarking memory bandwidth and latency with minimal dependencies, suitable for various platforms.
  • Geekbench - Cross platform benchmarking tool
  • Empirical Roofline Tool (ERT) - Create empirical roofline plots, alternative to intel vtune for any machine
  • Roofline Visualizer for ERT - Visualizer for ERT

IO/Visualization Tools for HPC

  • ADIOS2 - The Adaptable IO System version 2, designed for flexible and efficient I/O for scientific data, supporting a wide range of HPC simulations.
  • Amira - A powerful, multifaceted 3D software platform for visualizing, manipulating, and understanding Life Science and bio-medical data coming from all types of sources.
  • hdf5 - The Hierarchical Data Format version 5 (HDF5), is an open source file format that supports large, complex, heterogeneous data.
  • paraview - An open-source, multi-platform data analysis and visualization application.
  • Scientific Visualization Wiki - A comprehensive guide to the field of scientific visualization, detailing techniques, tools, and applications.
  • the yt project - An open-source, Python-based package for analyzing and visualizing volumetric data.
  • vedo - A lightweight and powerful python module for scientific analysis and visualization of 3D objects and point clouds based on VTK.
  • visit - An Open Source, interactive, scalable, visualization, animation and analysis tool.

General Purpose Scientific Computing Libraries for HPC

Misc.

Wikis

Hardware

Interconnects/Topology

CPU

GPU

TPU/Tensor Cores

Many integrated core processor (MIC)

Cloud

Vendors

Articles/Papers

Custom/FPGA/ASIC/APU

Certification

Student Opportunities / Workshops

Other/Wikis

People

Resources

Books/Manuals

Courses

Tutorials/Guides/Articles

Review Papers/Articles

News

Podcasts

Video Presentations/Courses/Channels

Presentation Slides

Building Clusters/Virtual Clusters

Forums

Careers

Membership Clubs

Blogs

Journals

Conferences

Communities/Chat Groups

Twitters

Consulting

Interview Preparation

Organizations

Interesting r/HPC posts

Misc. Wikis

Misc. Papers/Articles

Misc. Repos

Misc. Theses

Misc.

Games/Challenges

Other Curated Lists

Acknowledgements

This repo started from the great curated list https://github.com/taskflow/awesome-parallel-computing