avx512
There are 128 repositories under avx512 topic.
simdjson/simdjson
Parsing gigabytes of JSON per second : used by Facebook/Meta Velox, the Node.js runtime, ClickHouse, WatermelonDB, Apache Doris, Milvus, StarRocks
google/highway
Performance-portable, length-agnostic SIMD with runtime dispatch
HJLebbink/asm-dude
Visual Studio extension for assembly syntax highlighting and code completion in assembly files and the disassembly window
oneapi-src/oneDNN
oneAPI Deep Neural Network Library (oneDNN)
simd-everywhere/simde
Implementations of SIMD instruction sets for systems which don't natively support them.
xtensor-stack/xsimd
C++ wrappers for SIMD intrinsics and parallelized, optimized mathematical functions (SSE, AVX, AVX512, NEON, SVE))
ermig1979/Simd
C++ image processing and machine learning library with using of SIMD: SSE, AVX, AVX-512, AMX for x86/x64, NEON for ARM.
kfrlib/kfr
Fast, modern C++ DSP framework, FFT, Sample Rate Conversion, FIR/IIR/Biquad Filters (SSE, AVX, AVX-512, ARM NEON)
VcDevel/Vc
SIMD Vector Classes for C++
p12tic/libsimdpp
Portable header-only C++ low level SIMD library
SnellerInc/sneller
World's fastest log analysis: λ + SQL + JSON + S3
ashvardanian/SimSIMD
Up to 200x Faster Dot Products & Similarity Metrics — for Python, Rust, C, JS, and Swift, supporting f64, f32, f16 real & complex, i8, and bit vectors using SIMD for both AVX2, AVX-512, NEON, SVE, & SVE2 📐
minio/sha256-simd
Accelerate SHA256 computations in pure Go using AVX512, SHA Extensions for x86 and ARM64 for ARM. On AVX512 it provides an up to 8x improvement (over 3 GB/s per core). SHA Extensions give a performance boost of close to 4x over native.
kimwalisch/primesieve
🚀 Fast prime number generator
intel/x86-simd-sort
C++ template library for high performance SIMD based sorting algorithms
libxsmm/libxsmm
Library for specialized dense and sparse matrix operations, and deep learning primitives.
shibatch/sleef
SIMD Library for Evaluating Elementary Functions, vectorized libm and DFT
VcDevel/std-simd
std::experimental::simd for GCC [ISO/IEC TS 19570:2018]
kimwalisch/libpopcnt
🚀 Fast C/C++ bit population count library
agenium-scale/nsimd
Agenium Scale vectorization library for CPUs and GPUs
WojciechMula/sse-popcount
SIMD (SSE) population count --- http://0x80.pl/articles/sse-popcount.html
WojciechMula/toys
Storage for my snippets, toy programs, etc.
kimwalisch/primecount
🚀 Fast prime counting function implementations
RRZE-HPC/OSACA
Open Source Architecture Code Analyzer
powturbo/Turbo-Base64
Turbo Base64 - Fastest Base64 SIMD:SSE/AVX2/AVX512/Neon/Altivec - Faster than memcpy!
WojciechMula/sse4-strstr
SIMD (SWAR/SSE/SSE4/AVX2/AVX512F/ARM Neon) of Karp-Rabin algorithm's modification
altimesh/hybridizer-basic-samples
Examples of C# code compiled to GPU by hybridizer
agenium-scale/boost.simd
Boost SIMD
WojciechMula/base64-avx512
Code for paper "Base64 encoding and decoding at almost the speed of a memory copy"
minio/md5-simd
Accelerate aggregated MD5 hashing performance up to 8x for AVX512 and 4x for AVX2. Useful for server applications that need to compute many MD5 sums in parallel.
manodeep/Corrfunc
⚡️⚡️⚡️Blazing fast correlation functions on the CPU.
WojciechMula/base64simd
Base64 coding and decoding with SIMD instructions (SSE/AVX2/AVX512F/AVX512BW/AVX512VBMI/ARM Neon)
animetosho/md5-optimisation
The fastest MD5 implementation using x86 assembly
yzhaiustc/Optimizing-DGEMM-on-Intel-CPUs-with-AVX512F
Stepwise optimizations of DGEMM on CPU, reaching performance faster than Intel MKL eventually, even under multithreading.
cdl-saarland/rv
RV: A Unified Region Vectorizer for LLVM
intel/yask
YASK--Yet Another Stencil Kit: a domain-specific language and framework to create high-performance stencil code for implementing finite-difference methods and similar applications.