
A curated list of awesome high performance computing resources

Table of Contents

General Info

A Few Upcoming Supercomputers

Most Recent List of the Top500 Supercomputers




Popular HPC Programming Libraries/APIs/Tools/Standards/Simulators

  • alpaka - The alpaka library is a header-only C++17 abstraction library for accelerator development
  • async-rdma - A framework for writing RDMA applications with high-level abstraction and asynchronous APIs
  • CAF - An Open Source Implementation of the Actor Model in C++
  • Chapel - A Programming Language for Productive Parallel Computing on Large-scale Systems
  • Charm++ - Parallel Programming with Migratable Objects
  • Cilk Plus - C/C++ Extension for Data and Task Parallelism
  • Codon - high-performance Python compiler that compiles Python code to native machine code without any runtime overhead
  • CUDA - High performance NVIDIA GPU acceleration
  • dask - Dask provides advanced parallelism for analytics, enabling performance at scale for the tools you love
  • DeepSpeed - An easy-to-use deep learning optimization software suite that enables unprecedented scale and speed for Deep Learning Training and Inference
  • DeterminedAI - Distributed deep learning
  • FastFlow - High-performance Parallel Patterns in C++
  • Galois - A C++ Library to Ease Parallel Programming with Irregular Parallelism
  • Halide - A language for fast, portable computation on images and tensors
  • Heteroflow - Concurrent CPU-GPU Task Programming using Modern C++
  • highway - Performance portable SIMD intrinsics
  • HIP - HIP is a C++ Runtime API and Kernel Language for AMD/Nvidia GPU
  • HPC-X - Nvidia implementation of MPI
  • HPX - A C++ Standard Library for Concurrency and Parallelism
  • Horovod - Distributed deep learning training framework for TensorFlow, Keras, PyTorch, and Apache MXNet
  • ISPC - An open-source compiler for high-performance SIMD programming on the CPU and GPU
  • Intel ISPC - SPMD compiler
  • Intel TBB - Threading Building Blocks
  • joblib - Data-flow programming for performance (python)
  • Kompute - The general purpose GPU compute framework for cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends)
  • Kokkos - A C++ Programming Model for Writing Performance Portable Applications on HPC platforms
  • Kubeflow MPI Operator - MPI Operator for Kubeflow
  • Legate - Nvidia replacement for numpy based on Legion
  • Legion - Distributed heterogeneous programming library
  • MAGMA - Next generation linear algebra (LA) GPU accelerated libraries
  • Merlin - A distributed task queuing system, designed to allow complex HPC workflows to scale to large numbers of simulations
  • Microsoft MPI - Microsoft's implementation of MPI
  • MOGSLib - User defined schedulers
  • mpi4jax - Zero-copy mpi for jax arrays
  • mpi4py - Python bindings for MPI
  • MPI - OpenMPI implementation of the Message passing interface
  • MPI - MPICH implementation of the Message passing interface
  • MPI Standardization Forum - Forum for MPI standardization
  • MPAVICH - Implementation of MPI
  • NCCL - The NVIDIA Collective Communication Library for multi-GPU and multi-node communication
  • cuNumeric - GPU drop-in for numpy
  • stdpar - GPU accelerated C++ from NVIDIA
  • numba - A JIT compiler that translates a subset of Python into fast machine code
  • oneAPI - A unified, multiarchitecture, multi-vendor programming model
  • OpenACC - "OpenMP for GPUs"
  • OpenCilk - MIT continuation of Cilk Plus
  • OpenMP - Multi-platform Shared-memory Parallel Programming in C/C++ and Fortran
  • PVM - Parallel Virtual Machine: A predecessor to MPI for distributed computing
  • PMIX - Standard for process management
  • Pollux - Message Passing Cloud orchestrator
  • Pyfi - Distributed flow and computation system
  • RAJA - Architecture and programming model portability for HPC applications
  • RaftLib - A C++ Library for Enabling Stream and Dataflow Parallel Computation
  • ray - Scale AI and Python workloads from reinforcement learning to deep learning
  • ROCM - First open-source software development platform for HPC/Hyperscale-class GPU computing
  • RS MPI - Rust bindings for MPI
  • Scalix - Data parallel computing framework
  • Simgrid - Simulate cluster/HPC environments
  • SkelCL - A Skeleton Library for Heterogeneous Systems
  • STAPL - Standard Template Adaptive Parallel Programming Library in C++
  • STLab - High-level Constructs for Implementing Multicore Algorithms with Minimized Contention
  • SYCL - C++ Abstraction layer for heterogeneous devices
  • Taichi - Parallel programming language for high-performance numerical computations in Python
  • Taskflow - A Modern C++ Parallel Task Programming Library
  • The Open Community Runtime - Specification for Asynchronous Many Task systems
  • Transwarp - A Header-only C++ Library for Task Concurrency
  • Tuplex - Blazing fast python data science
  • UCX - Optimized production proven-communication framework

Cluster Hardware Discovery Tools

Cluster Management/Tools/Schedulers/Stacks

HPC-specific Operating Systems

Development/Workflow/Monitoring Tools for HPC

Debugging Tools for HPC

Performance/Benchmark Tools for HPC

IO/Visualization Tools for HPC

General Purpose Scientific Computing Libraries for HPC







TPU/Tensor Cores

Many integrated core processor (MIC)






Student Opportunities







Review Papers/Articles



Youtube Videos/Courses/Channels

Presentation Slides

Building Clusters



Membership Clubs




Communities/Chat Groups



Interview Preparation


Misc. Wikis

Misc. Papers/Articles

Misc. Repos

Misc. Theses



Other Curated Lists


