A curated list of awesome high performance computing resources.
- El Capitan - 2023, AMD-based, ~1.5 exaflops
- Tianhe-3 - 2022, ~700 Petaflop (Linpack500)
- History of Supercomputing (Wikipedia)
- History of Parallel Computing (Wikipedia)
- History of the Top500 (Wikipedia)
- History of LLNL Computing
- The Supermen: The Story of Seymour Cray ... (1997)
- Unmatched - 50 Years of Supercomputing (2023)
- alpaka - The alpaka library is a header-only C++17 abstraction library for accelerator development
- async-rdma - A framework for writing RDMA applications with high-level abstraction and asynchronous APIs
- CAF - An Open Source Implementation of the Actor Model in C++
- Chapel - A Programming Language for Productive Parallel Computing on Large-scale Systems
- Charm++ - Parallel Programming with Migratable Objects
- Cilk Plus - C/C++ Extension for Data and Task Parallelism
- Codon - high-performance Python compiler that compiles Python code to native machine code without any runtime overhead
- CUDA - High performance NVIDIA GPU acceleration
- dask - Dask provides advanced parallelism for analytics, enabling performance at scale for the tools you love
- DeepSpeed - An easy-to-use deep learning optimization software suite that enables unprecedented scale and speed for Deep Learning Training and Inference
- DeterminedAI - Distributed deep learning
- FastFlow - High-performance Parallel Patterns in C++
- Galois - A C++ Library to Ease Parallel Programming with Irregular Parallelism
- Halide - A language for fast, portable computation on images and tensors
- Heteroflow - Concurrent CPU-GPU Task Programming using Modern C++
- highway - Performance portable SIMD intrinsics
- HIP - HIP is a C++ Runtime API and Kernel Language for AMD/Nvidia GPU
- HPC-X - Nvidia implementation of MPI
- HPX - A C++ Standard Library for Concurrency and Parallelism
- Horovod - Distributed deep learning training framework for TensorFlow, Keras, PyTorch, and Apache MXNet
- ISPC - An open-source compiler for high-performance SIMD programming on the CPU and GPU
- Intel ISPC - SPMD compiler
- Intel TBB - Threading Building Blocks
- joblib - Data-flow programming for performance (python)
- Kompute - The general purpose GPU compute framework for cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends)
- Kokkos - A C++ Programming Model for Writing Performance Portable Applications on HPC platforms
- Kubeflow MPI Operator - MPI Operator for Kubeflow
- Legate - Nvidia replacement for numpy based on Legion
- Legion - Distributed heterogeneous programming library
- MAGMA - Next generation linear algebra (LA) GPU accelerated libraries
- Merlin - A distributed task queuing system, designed to allow complex HPC workflows to scale to large numbers of simulations
- Metal - Apple's GPU API
- Microsoft MPI - Microsoft's implementation of MPI
- MOGSLib - User defined schedulers
- mpi4jax - Zero-copy mpi for jax arrays
- mpi4py - Python bindings for MPI
- MPI - OpenMPI implementation of the Message passing interface
- MPI - MPICH implementation of the Message passing interface
- MPI Standardization Forum - Forum for MPI standardization
- MPAVICH - Implementation of MPI
- NCCL - The NVIDIA Collective Communication Library for multi-GPU and multi-node communication
- cuNumeric - GPU drop-in for numpy
- stdpar - GPU accelerated C++ from NVIDIA
- numba - A JIT compiler that translates a subset of Python into fast machine code
- oneAPI - A unified, multiarchitecture, multi-vendor programming model
- OpenACC - "OpenMP for GPUs"
- OpenCilk - MIT continuation of Cilk Plus
- OpenMP - Multi-platform Shared-memory Parallel Programming in C/C++ and Fortran
- PVM - Parallel Virtual Machine: A predecessor to MPI for distributed computing
- PMIX - Standard for process management
- Pollux - Message Passing Cloud orchestrator
- Pyfi - Distributed flow and computation system
- RAJA - Architecture and programming model portability for HPC applications
- RaftLib - A C++ Library for Enabling Stream and Dataflow Parallel Computation
- ray - Scale AI and Python workloads from reinforcement learning to deep learning
- ROCM - First open-source software development platform for HPC/Hyperscale-class GPU computing
- RS MPI - Rust bindings for MPI
- Scalix - Data parallel computing framework
- Simgrid - Simulate cluster/HPC environments
- SkelCL - A Skeleton Library for Heterogeneous Systems
- STAPL - Standard Template Adaptive Parallel Programming Library in C++
- STLab - High-level Constructs for Implementing Multicore Algorithms with Minimized Contention
- SYCL - C++ Abstraction layer for heterogeneous devices
- Taichi - Parallel programming language for high-performance numerical computations in Python
- Taskflow - A Modern C++ Parallel Task Programming Library
- The Open Community Runtime - Specification for Asynchronous Many Task systems
- Transwarp - A Header-only C++ Library for Task Concurrency
- Tuplex - Blazing fast python data science
- UCX - Optimized production proven-communication framework
- Zluda - Run unmodified CUDA applications with near-native performance on Intel AMD GPUs.
- HyperQueue - HyperQueue is a tool designed to simplify execution of large workflows (task graphs) on HPC clusters.
- cpuid - A software instruction available on Intel, AMD, and other processors that can be used to determine processor type and features.
- cpuid instruction note - A detailed note on the CPUID instruction used for processor identification.
- cpufetch - A simple yet fancy CPU architecture fetching tool.
- gpufetch - A tool similar to cpufetch, but for fetching GPU architecture.
- intel cpuinfo - Intel tool providing information about the characteristics of Intel CPUs.
- Likwid - Provides all information about the supercomputer/cluster.
- LIKWID.jl - Julia wrapper for LIKWID.
- openmpi hwloc - Portable Hardware Locality (hwloc) software project.
- PRK - Parallel Research Kernels - A collection of kernels for parallel programming research.
- BeeGFS - A parallel file system designed for performance-critical environments.
- Bluebanquise - An open-source cluster management tool.
- Bright Cluster Manager - Software for deploying and managing HPC and AI server clusters.
- Ceph - An open-source distributed storage system.
- DeepOps - Nvidia's GPU infrastructure and automation tools for Kubernetes and Slurm clusters.
- E4S - The Extreme Scale HPC Scientific Stack - A collection of open-source software packages for HPC environments.
- Easybuild - A package manager for HPC/supercomputers.
- EESSI - A shared stack of scientific software installations.
- Flux framework - A framework for high-performance computing clusters.
- fpsync - A tool for fast parallel data transfer using fpart and rsync.
- GPFS - A high-performance parallel file system developed by IBM.
- Guix - A package manager for HPC/supercomputers.
- Intel DAOS - A software-defined scale-out object store for HPC applications.
- LSF - A batch system for HPC and distributed computing environments.
- Lmod - A Lua-based module system for software environment management on HPC systems.
- Lustre Parallel File System - A high-performance distributed filesystem for large-scale cluster computing.
- moosefs - A fault-tolerant, highly available, distributed file system.
- NetApp - Intelligent data infrastructure for various workloads.
- OpenHPC - A community-led set of HPC components.
- OpenOnDemand - A web portal for accessing supercomputing resources.
- OpenPBS - A software for workload management and job scheduling.
- OpenXdMod - A tool for managing high-performance computing resources.
- RADIUSS - Rapid Application Development via an Institutional Universal Software Stack.
- rocks - An open-source Linux cluster distribution.
- Ruse - A tool for managing software environments in HPC clusters.
- SGE - A resource management software for large clusters of computers.
- Slurm - A cluster management and job scheduling system for Linux clusters.
- Spack - A package manager for HPC/supercomputers.
- sstack - A tool to install multiple software stacks such as Spack, EasyBuild, and Conda.
- Starfish - Unstructured data management and metadata solution for files and objects.
- Warewulf - An operating system provisioning system and cluster management tool.
- xCat - A distributed computing management and provisioning tool.
- XDMoD - An open-source tool for managing high-performance computing resources.
- Globus Connect - A fast data transfer tool between supercomputers.
- Kitten - A lightweight kernel designed for high-performance computing. It focuses on providing low noise and predictable performance for HPC applications.
- McKernel - A hybrid kernel that combines Linux and a lightweight kernel designed to provide high performance for HPC applications.
- mOS - A specialized operating system for high-performance computing, designed to support large-scale, manycore processors.
- Apache Airflow - A platform to programmatically author, schedule, and monitor workflows.
- Apptainer (formerly Singularity) - Container platform designed for scientific and high-performance computing (HPC) environments.
- arbiter2 - Monitors and protects interactive nodes with cgroups.
- Charliecloud - Lightweight container solution for high-performance computing (HPC).
- Docker - A set of platform as a service products that use OS-level virtualization to deliver software in packages called containers.
- genv - GPU Environment Management for managing and scheduling GPU resources.
- Grafana - Open-source platform for monitoring and observability, visualizing metrics.
- grpc - A high-performance, open-source universal RPC framework.
- HPC Rocket - Allows submitting Slurm jobs in Continuous Integration (CI) pipelines.
- HTCondor - An open-source high-throughput computing software framework.
- Jacamar-ci - CI/CD tool designed for HPC and scientific computing workflows.
- Kubernetes - An open-source system for automating deployment, scaling, and management of containerized applications.
- nextflow - A workflow framework to deploy data-driven computational pipelines.
- perun - Energy monitor for HPC systems, focusing on performance and energy efficiency.
- Prefect - A workflow management system, designed for modern infrastructure and powered by the open-source Prefect Core workflow engine.
- Prometheus - An open-source monitoring system with a dimensional data model, flexible query language, efficient time series database and modern alerting approach.
- redun - Workflow engine that emphasizes simplicity, reliability, and scalability.
- remora - Tool for monitoring and reporting the performance of batch jobs on HPC systems.
- ruptime - A utility for monitoring the status of computational jobs and systems.
- Slurmvision slurm dashboard - A dashboard for monitoring and managing Slurm jobs.
- slurm docker cluster - A Slurm cluster implemented using Docker containers, for development and testing.
- snakemake - A workflow management system that reduces the complexity of creating reproducible and scalable data analyses.
- Stui slurm dashboard for the terminal - A terminal-based UI for managing and monitoring Slurm clusters.
- Vaex - A Python library for lazy Out-of-Core DataFrames (similar to Pandas), to visualize and explore big tabular datasets.
- ddt - A powerful debugger designed for developers to solve complex problems on multi-threaded and multi-process environments in HPC.
- marmot MPI checker - A tool for detecting and reporting issues in MPI (Message Passing Interface) applications.
- python debugging tools - A collection of tools for debugging Python applications, including pdb and other utilities.
- seer modern gui for gdb - A graphical user interface for GDB, aiming to improve the debugging experience with modern features and visuals.
- Summary of C/C++ debugging tools - An overview of various debugging tools available for C/C++ applications, focusing on HPC environments.
- totalview - A comprehensive source code analysis and debugging tool designed for complex software running on HPC systems, supporting a wide range of languages and architectures.
- demonspawn - A framework for automated execution of benchmarks and simulations, designed for HPC environments.
- Google benchmark - A microbenchmark support library for C++ that tracks performance over time.
- HPL benchmark - The High Performance Linpack Benchmark for measuring floating-point computing power of systems.
- kerncraft - A tool for analytical modeling of loop performance and cache behavior on HPC systems.
- NASA parallel benchmark suite - A set of benchmarks designed to evaluate the performance of parallel supercomputers.
- papi - Provides standard APIs for accessing hardware performance counters available on modern microprocessors.
- scalasca - A software tool that supports performance analysis of large-scale parallel applications.
- scalene - A high-performance, high-precision CPU, GPU, and memory profiler for Python.
- Summary of code performance analysis tools - An overview of tools for analyzing HPC application performance.
- Summary of profiling tools - A comprehensive list of profiling tools for performance analysis in HPC.
- tau - TAU (Tuning and Analysis Utilities) is a profiling and tracing toolkit for performance analysis of parallel programs.
- The Bandwidth Benchmark - A tool for measuring memory bandwidth across various CPUs and systems.
- vampir - A tool for detailed analysis of MPI program executions by visualizing their event traces.
- bytehound memory profiler - A detailed memory profiler for tracking down memory issues and leaks.
- Flamegraphs - Visualization tool for profiling software, allowing quick identification of performance bottlenecks.
- fio - Flexible I/O tester for benchmarking and stress/hardware verification.
- IBM Spectrum Scale Key Performance Indicators (KPI) - Provides key performance indicators for IBM Spectrum Scale, aiding in performance tuning and monitoring.
- Ior - A parallel file system I/O benchmarking tool used widely in HPC for testing storage systems.
- ngstress - A versatile tool for stressing various subsystems of a computer to find hardware faults or to benchmark performance.
- Hotspot - The Linux perf GUI for in-depth performance analysis and visualization of software behavior.
- mixbench - A benchmark suite designed to evaluate CPUs and GPUs across different compute and memory operations.
- pmu-tools (toplev) - Performance monitoring tools for modern Intel CPUs, offering detailed insights into hardware and application performance.
- SPEC CPU Benchmark - A benchmark suite designed to provide a comparative measure of compute-intensive performance across the widest practical range of hardware.
- STREAM Memory Bandwidth Benchmark - Measures sustainable memory bandwidth and the corresponding computation rate for simple vector kernels.
- Intel MPI benchmarks - A set of benchmarks designed to measure the performance and scalability of MPI implementations on Intel architectures.
- Ohio state MPI benchmarks - A comprehensive suite of benchmarks for evaluating MPI performance across a variety of message passing patterns and communication protocols.
- hpctoolkit - An integrated suite of tools for measurement and analysis of program performance on computers ranging from desktops to supercomputers.
- core-to-core-latency - A diagnostic tool designed to measure and report the latency between CPU cores, aiding in the optimization of parallel computing tasks.
- speedscope - An interactive, web-based viewer for performance profiles of software. It supports various formats and provides a flamegraph visualization to identify hot paths efficiently.
- Differential Flamegraphs - A visualization technique developed by Brendan Gregg that highlights differences between performance profiles, making it easier to spot performance regressions or improvements.
- Hyperfine - A command-line benchmarking tool that provides a simple and user-friendly means to compare the performance of commands, featuring statistical analysis across multiple runs.
- Openfoam HPC benchmark - A benchmarking suite for evaluating the High Performance Computing capabilities of OpenFOAM, an open-source CFD software, under various computational loads.
- OSU microbenchmarks - A collection of microbenchmarks designed to evaluate the performance of MPI implementations across various communication protocols and message sizes.
- fio flexible I/O tester - A versatile tool for I/O workload simulation and benchmarking, capable of testing a wide array of storage and filesystem configurations.
- vftrace - A tracing tool specifically designed for the NEC SX-Aurora TSUBASA Vector Engine, enabling detailed performance analysis of vectorized code.
- tinymembench - A simple memory benchmark tool, focusing on benchmarking memory bandwidth and latency with minimal dependencies, suitable for various platforms.
- ADIOS2 - The Adaptable IO System version 2, designed for flexible and efficient I/O for scientific data, supporting a wide range of HPC simulations.
- Amira - A powerful, multifaceted 3D software platform for visualizing, manipulating, and understanding Life Science and bio-medical data coming from all types of sources.
- hdf5 - The Hierarchical Data Format version 5 (HDF5), is an open source file format that supports large, complex, heterogeneous data.
- paraview - An open-source, multi-platform data analysis and visualization application.
- Scientific Visualization Wiki - A comprehensive guide to the field of scientific visualization, detailing techniques, tools, and applications.
- the yt project - An open-source, Python-based package for analyzing and visualizing volumetric data.
- vedo - A lightweight and powerful python module for scientific analysis and visualization of 3D objects and point clouds based on VTK.
- visit - An Open Source, interactive, scalable, visualization, animation and analysis tool.
- petsc
- ginkgo
- GSL
- Scalapack
- rapids.ai - collection of libraries for executing end-to-end data science pipelines completely in the GPU
- trilinos
- tnl project
- mimalloc memory allocator
- jemalloc memory allocator
- tcmalloc memory allocator
- Horde memory allocator
- Software utilization at UK National Supercomputing Service, ARCHER2
- Ethernet
- Infiniband
- Network topologies
- Battle of the infinibands - Omnipath vs Infiniband
- Mellanox infiniband cluster config
- RoCE - RDMA Over Converged Ethernet
- Slingshot interconnect
- CXL - Compute Express Link
- Infiniband Essentials
- Wikichip
- Microarchitecture of Intel/AMD CPUs
- Apple M1
- Apple M2
- Apple M2 Teardown
- Apply M1/M2 AMX
- Apple M3
- List of Intel processors
- List of Intel micro architectures
- Comparison of Intel processors
- Comparison of Apple processors
- List of AMD processors
- List of AMD CPU micro architectures
- Comparison of AMD architectures
- Gpu Architecture Analysis
- A trip through the Graphics Pipeline
- A100 Whitepaper
- MIG
- Gentle Intro to GPU Inner Workings
- AMD Instinct GPUs
- AMD GPU ROCm Support and OS Compatibility
- List of AMD GPUs
- Comparison of CUDA architectures
- Tales of the M1 GPU
- List of Intel GPUs
- Performance of DGX Cluster
- AWS HPC
- Azure HPC
- rescale
- vast.ai
- vultr - cheap bare metal CPU, GPU, DGX servers
- hetzner - cheap servers incl. 80-core ARM
- Ampere ARM cloud-native processors
- Scaleway
- Chameleon Cloud
- Lambda Labs
- Runpod
- The use of Microsoft Azure for high performance cloud computing – A case study
- AWS Cluster in the cloud
- AWS Parallel Cluster
- An Empirical Study of Containerized MPI and GUI Application on HPC in the Cloud
- Supercomputing Conference Student Opportunities
- SCC Student cluster competition
- Winter Classic Invitational
- Linux Cluster Institute
- Supercomputer
- Supercomputer architecture
- Computer cluster
- Comparison of Intel processors
- Comparison of Apple processors
- Comparison of AMD architectures
- Comparison of CUDA architectures
- Cache
- Google TPU
- IPMI
- FRU
- Disk Arrays
- RAID
- Cray
- Digital Signal Processors
- Jack Dongarra - 2021 Turing Award - LINPACK, BLAS, LAPACK, MPI
- Bill Gropp - 2010 IEEE TCSC Medal for Excellence in Scalable Computing
- David Bader - built the first Linux supercomputer
- Thomas Sterling - Inventor of Beowulf cluster, ParalleX/HPX
- Seymour Cray - Inventor of the Cray Supercomputer
- Larry Smarr - HPC Application Pioneer
- Free Modern HPC Books by Victor Eijkhout
- High Performance Parallel Runtimes
- The OpenMP Common Core: Making OpenMP Simple Again
- Parallel and High Performance Computing
- Algorithms for Modern Hardware
- High Performance Computing: Modern Systems and Practices - Thomas Sterling, Maciej Brodowicz, Matthew Anderson 2017
- Introduction to High Performance Computing for Scientists and Engineers - Hager 2010
- Computer Organization and Design
- Optimizing HPC Applications with Intel Cluster Tools: Hunting Petaflops
- Introduction to High Performance Scientific Computing - Victor Eijkhout 2021
- Parallel Programming for Science and Engineering - Victor EIjkhout 2021
- Parallel Programming for Science and Engineering - HTML Version
- C++ High Performance
- Data Parallel C++ Mastering DPC++ for Programming of Heterogeneous Systems using C++ and SYCL
- High Performance Python
- C++ Concurrency in Action: Practical Multithreading - Anthony Williams 2012
- The Art of Multiprocessor Programming - Maurice Herlihy 2012
- Parallel Computing: Theory and Practice - Umut A. Acar 2016
- Introduction to Parallel Computing - Zbigniew J. Czech
- Practical guide to bare metal C++
- Optimizing software in C++
- Optimizing subroutines in assembly code
- Microarchitecture of Intel/AMD CPUs
- Parallel Programming with MPI
- HPC, Big Data, AI Convergence Towards Exascale: Challenge and Vision
- Introduction to parallel computing - Ananth Grama
- The Student Supercomputer Challenge Guide
- The Rust Performance Book
- E-Zines on Bash, Linux, Perf, etc - Julia Evans
- The Art of Writing Efficient Programs: An Advanced Programmer's Guide to Efficient Hardware Utilization and Compiler Optimizations Using C++ Examples
- OpenMP Examples - openmp.org
- Latest books on OpemMP - openmp.org
- Programming Massively Parallel Processors 4th Edition 2023
- Software Optimization Cookbook
- Power and Performance_ Software Analysis and Optimization
- Gropp books on MPI
- Performance Analysis and Tuning on Modern CPUs
- HPC Carpentry
- Berkeley: Applications of Parallel Computers - Detailed course on HPC
- CS6290 High-performance Computer Architecture - Milos Prvulovic and Catherine Gamboa at George Tech
- Udacity High Performance Computing
- Parallel Numerical Algorithms
- Vanderbilt - Intro to HPC
- Illinois - Intro to HPC - Creator of PyCuda
- Archer1 Courses
- TACC tutorials
- Livermore training materials
- Xsede training materials
- Parallel Computation Math
- Introduction to High-Performance and Parallel Computing - Coursera
- Foundations of HPC 2020/2021
- Principles of Distributed Computing
- High Performance Visualization
- Temple course on building/maintaining a cluster
- Nvidia Deep Learning Course
- Coursera GPU Programming Specialization
- Coursera Fundamentals of Parallelism on Intel Architecture
- Coursera Introduction to High Performance Computing
- Archer2 Shared Memory Programming with OpenMP
- Archer2 Message-Passing Programming with MPI
- HetSys 2022 Course
- Edukamu Introduction to Supercomputing
- Heterogeneous Parallel Programming by S K
- NCSA HPC Training Moodle
- Supercomputing in plain english
- Cornell workshop
- Carpentries Incubator HPC Intro
- UL HPC School
- Introduction to High-Performance Parallel Distributed Computing using Chapel, UPC++ and Coarray Fortran
- Performance Engineering off Software Systems (MIT-OCW)
- Introduction to Parallel Computing (CMSC 498X/818X)
- Infiniband Essentials
- Performance Ninja Optimization Course
- MpiTutorial - A fantastic mpi tutorial
- Beginners Guide to HPC
- Rookie HPC Guide
- RedHat High Performance Computing 101
- Parallel Computing Training Tutorials - Lawrence Livermore National Laboratory
- Foundations of Multithreaded, Parallel, and Distributed Programming
- Building pipelines using slurm dependencies
- Writing slurm scripts in python,r and bash
- Xsede new user tutorials
- Supercomputing in plain english
- Improving Performance with SIMD intrinsics
- Want speed? Pass by value
- Introduction to low level bit hacks
- How to write fast numerical code: An Introduction
- Lecture notes on Loop optimizations
- A practical approach to code optimization
- Software optimization manuals
- Guide into OpenMP: Easy multithreading programming for C++
- An Introduction to the Partitioned Global Address Space (PGAS) Programming Model
- Jax in 2022
- C++ Benchmarking for beginners
- Mapping MPI ranks to multiple cuda GPU
- Oak Ridge National Lab Tutorials
- How to perform large scale data processing in bioinformatics
- Step by step SGEMM in OpenCL
- Frontier User Guide
- Allocating large blocks of memory in bare-metal C programming
- Hashmap benchmarks 2022
- LLNL HPC Tutorials
- High Performance Computing: A Bird's Eye View
- The dirty secret of high performance computing
- Multiple GPUs with pytorch
- Brendan Gregg on Linux Performance
- Automatic Slurm build scripts
- Fastest unordered_map implementation / benchmarks
- Memory bandwith NapkinMath
- Avoiding Instruction Cache Misses
- Multi-GPU Programming with Standard Parallel C++
- EuroCC National Competence Center Sweden (ENCCS) HPC tutorials
- LLNL hpc tutorials
- python.org Python Performance Tips
- HPC toolset tutorial (cluster management)
- OpenMP tutorials
- CUDA best practices guide
- Understanding CPU Architecture And Performance Using LIKWID
- 32 OpenMP Traps For C++ Developers
- Interactive and Urgent HPC Challenges (2024)
- The Landscape of Exascale Research: A Data-Driven Literature Analysis (2020)
- The Landscape of Parallel Computing Research: A View from Berkeley
- Extreme Heterogeneity 2018: Productive Computational Science in the Era of Extreme Heterogeneity
- Programming for Exascale Computers - Will Gropp, Marc Snir
- On the Memory Underutilization: Exploring Disaggregated Memory on HPC Systems (2020)
- Advances in Parallel & Distributed Processing, and Applications (conference proceedings)
- Designing Heterogeneous Systems: Large Scale Architectural Exploration Via Simulation
- Reinventing High Performance Computing: Challenges and Opportunities (2022)
- Challenges in Heterogeneous HPC White Paper (2022)
- An Evolutionary Technical & Conceptual Review on High Performance Computing Systems (Dec 2021)
- New Horizons for High-Performance Computing (2022)
- CConfidential High-Performance Computing in the Public Cloud
- Containerisation for High Performance Computing Systems: Survey and Prospects
- Heterogeneous Computing Systems (2023)
- Myths and Legends in High-Performance Computing
- Energy-Aware Scheduling for High-Performance Computing Systems: A Survey
- Ultimate Physical limits to computation - Seth Lloyd
- Myths and Legends in High-Performance Computing
- Abstract Machine Models and Proxy Architectures for Exascale Computing, 2014, Sandia National Laboratories and Lawrence Berkeley National Laboratory
- Some thoughts on the environmental impact of High Performance Computing
- A Research Retrospective on AMD's Exascale Computing Journey
- InsideHPC
- HPCWire
- NextPlatform
- Datacenter Dynamics
- Admin Magazine HPC
- Toms hardware
- Tech Radar
- Phoronix
- Argonne lectures on Extreme Scale Computing 2022
- Argonne supercomputer tour
- Containers in HPC - what they fix and what they break
- HPC Tech Shorts
- CppCon
- Create a clustering server
- Argonne national lab
- Oak Ridge National Lab
- Concurrency in C++20 and Beyond - A. Williams
- Is Parallel Programming still Hard? - P. McKenney, M. Michael, and M. Wong at CppCon 2017
- The Speed of Concurrency: Is Lock-free Faster? - Fedor G Pikus in CppCon 2016
- Expressing Parallelism in C++ with Threading Building Blocks - Mike Voss at Intel Webinar 2018
- A Work-stealing Runtime for Rust - Aaron Todd in Air Mozilla 2017
- C++11/14/17 atomics and memory model: Before the story consumes you - Michael Wong in CppCon 2015
- The C++ Memory Model - Valentin Ziegler at C++ Meeting 2014
- Sharcnet HPC
- Low Latency C++ for fun and profit
- scalane python profiler
- Kokkos lectures
- EasyBuild Tech Talk I - The ABCs of Open MPI, part 1 (by Jeff Squyres & Ralph Castain)
- The Spack 2022 Roadmap
- A Not So Simple Matter of Software | Talk by Turing Award Winner Prof. Jack Dongarra
- Vectorization/SIMD intrinsics
- New Silicon for Supercomputers: A Guide for Software Engineers
- TechTechPotato Channel
- How to write the perfect hash table
- FosDem 2024 HPC Big Data Conference videos
- Bright Computing Cluster Management Technical Overview
- What is HPC? An introduction by Canonical
- Slurm job schedular basics
- Task based Parallelism and why it's awesome - Pedro Gonnet
- Tuning Slurm Scheduling for Optimal Responsiveness and Utilization
- Parallel Programming Models Overview (2020)
- Comparative Analysis of Kokkos and Sycl (Jeff Hammond)
- Hybrid OpenMP/MPI Programming
- Designs, Lessons and Advice from Building Large Distributed Systems - Jeff Dean (Google)
- Practical Debugging and Performance Engineering
- Resources for learning about HPC networks and storage r/HPC
- Slurm for dummies guide
- Build a cluster under 50k
- Build a Beowulf cluster
- Build a Raspberry Pi Cluster
- Puget Systems
- Lambda Systems
- Titan computers
- Temple course on building/maintaining a cluster
- Detailed reddit discussion on setting up a small cluster
- Tiny titan - build a really cool pi supercomputer
- Building an Intel HPC cluster with OpenHPC
- Reddit r/HPC post on building clusters
- Build a virtual cluster with PelicanHPC
- Building a High-performance Computing Cluster Using FreeBSD
- Supermicro GPU racks
- VirtualOrfeo - Virtual HPC Cluster
- Is there a reason to build a raspberry pi clluster
- HPC University Careers search
- HPC wire career site
- HPC certification
- HPC SysAdmin Jobs (reddit)
- The United States Research Software Engineer Association
- NCSA Internship
- AI and Future HPC Job Prospect
- HPC sys admin career (reddit)
- 1024 Cores - Dmitry Vyukov
- The Black Art of Concurrency - Internal Pointers
- Cluster Monkey
- Johnathon Dursi
- Arm Vendor HPC blog
- HPC Notes
- Brendan Gregg Performance Blog
- Performance engineering blog
- Concurrency Freaks
- Servers@Home
- Dr.Bandwith Blog
- Johnny's Software Lab
- Daniel Lemire Blog
- IEEE Transactions on Parallel and Distributed Systems (TPDS)
- Journal of Parallel and Distributed Computing
- ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming (PPoPP)
- ACM Symposium on Parallel Algorithms and Architectures (SPAA)
- SC conference (SC)
- IEEE International Parallel and Distributed Processing Symposium (IPDPS)
- International Conference on Parallel Processing (ICPP)
- IEEE High Performance Extreme Computing Conference (HPEC)
- FosDem
- Prace
- Xsede
- Compute Canada
- Riken CSS
- Pawsey
- International Data Corporation
- List of Federally funded research and development centers
- Amdahl's Law
- HPC Wiki
- FLOPS
- Computational complexity of math operations
- Many Task Computing
- High Throughput Computing
- Parallel Virtual Machine
- OSI Model
- Workflow management
- Compute Canada Documentation
- Network Interface Controller (NIC)
- Just in time compilation
- List of distributed computing projects
- Computer cluster
- Quasi-opportunistic supercomputing
- Limits of Computation
- Bremermann's Limit
- Concurrency patterns
- Parallel Computing
- Server Management
- Advanced Parallel Programming in C++
- Tools for scientific computing
- Quantum Computing for High Performance Computing
- Benchmarking data science: Twelve ways to lie with statistics and performance on parallel computers.
- Establishing the IO500 Benchmark
- NVIDIA High Performance Computing articles
- Let's write a superoptimizer
- Why I think C++ is still a desirable coding platform compared to Rust
- The State of Fortran (arxiv paper 2022)
- 50 years later, is two phase locking still the best
- Estimating your memory bandwith
- Build a Beowulf cluster
- libsc - Supercomputing library
- xbyak jit assembler
- cpufetch - pretty cpu info fetcher
- RRZE-HPC
- Argonne Github
- Argonne Leadership Computing Facility
- Oak Ridge National Lab Github
- Compute Canada
- HPCInfo by Jeff Hammond
- Texas Advanced Computing Center (TACC) Github
- LANL HPC Github
- Rust in HPC
- University of Buffalo - Center for Computational Research
- Center for High Performance Computing - University of Utah
- Exascale Project
- Pocket HPC Survival Guide
- HPC Summer school
- Overview of all linear algebra packages
- Latency numbers
- Nvidia HPC benchmarks
- Intel Intrinsics Guide
- AWS Cloud calculator
- Quickly benchmark C++ functions
- LLNL Software repository
- Boinc - volunteer computing projects
- Prace Training Events
- Nice discussion on FlameGraph profiling
- Nice discussion on parts of a supercomputer on reddit
- Technical Report on C++ performance
- BOINC Compute for science
- Count prime numbers using MPI
- Awesome Cloud HPC
- Parallel Computing Guide
- Awesome Parallel Computing
- Princeton resources on OpenMP
- Awesome HPC
- Sig HPC Education
This repo started from the great curated list https://github.com/taskflow/awesome-parallel-computing