timo-eichhorn's Stars
fmtlib/fmt
A modern formatting library
NVIDIA/cuda-samples
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
sirupsen/napkin-math
Techniques and numbers for estimating system's performance from first-principles
dpilger26/NumCpp
C++ implementation of the Python Numpy library
kokkos/kokkos
Kokkos C++ Performance Portability Programming Ecosystem: The Programming Model - Parallel Execution and Memory Abstraction
foonathan/memory
STL compatible C++ memory allocator library using a new RawAllocator concept that is similar to an Allocator but easier to use and write.
vectorclass/version2
Vector class library, latest version
NVIDIA/cccl
CUDA Core Compute Libraries
dmlc/mshadow
Matrix Shadow:Lightweight CPU/GPU Matrix and Tensor Template Library in C++/CUDA for (Deep) Machine Learning
corsix/amx
Apple AMX Instruction Set
romeric/Fastor
A lightweight high performance tensor algebra framework for modern C++
trevor-vincent/awesome-high-performance-computing
A curated list of awesome high performance computing resources
VcDevel/std-simd
std::experimental::simd for GCC [ISO/IEC TS 19570:2018]
LLNL/RAJA
RAJA Performance Portability Layer (C++)
llohse/libnpy
C++ library for reading and writing of numpy's .npy files
harrism/hemi
Simple utilities to enable code reuse and portability between CUDA C/C++ and standard C/C++.
Mysticial/FeatureDetector
What features does your CPU and OS support?
wichtounet/etl
Blazing-fast Expression Templates Library (ETL) with GPU support, in C++
dsharlet/array
C++ multidimensional arrays in the spirit of the STL
crosetto/SoAvsAoS
C++ zero-cost abstraction for SoA/AoS memory layouts
coin-or/ADOL-C
A Package for Automatic Differentiation of Algorithms Written in C/C++
celerity/celerity-runtime
High-level C++ for Accelerator Clusters
Compaile/ctrack
A lightweight, high-performance C++ benchmarking and tracking library for effortless function profiling in both development and production environments. Features single-header integration, minimal overhead, multi-threaded support, customizable output, and advanced metrics for quick bottleneck detection in complex codebases.
reyoung/avx_mathfun
AVX-optimized sin(), cos(), exp() and log() functions
JamesYang007/FastAD
FastAD is a C++ implementation of automatic differentiation both forward and reverse mode.
taocpp/tuple
Compile-time-efficient proof-of-concept implementation for std::tuple
claudiopica/HiRep
HiRep repository
jtravs/cuda_complex
An implementation of C++ std::complex for CUDA devices (i.e. compiles with nvcc)
darksim33/Pyneapple
Pyneapple is an advanced tool for analysing multi-exponential signal data in MR DWI images.
GianlucaFuwa/MetaQCD.jl
Lattice QCD using variations of Metadynamics