avx512

There are 128 repositories under avx512 topic.

simdjson/simdjson
Parsing gigabytes of JSON per second : used by Facebook/Meta Velox, the Node.js runtime, ClickHouse, WatermelonDB, Apache Doris, Milvus, StarRocks
Language:C++19.4k 241 8361k
google/highway
Performance-portable, length-agnostic SIMD with runtime dispatch
Language:C++4.2k 47 399321
HJLebbink/asm-dude
Visual Studio extension for assembly syntax highlighting and code completion in assembly files and the disassembly window
Language:C#4.1k 729 13798
oneapi-src/oneDNN
oneAPI Deep Neural Network Library (oneDNN)
Language:C++3.6k 181 1.3k1k
simd-everywhere/simde
Implementations of SIMD instruction sets for systems which don't natively support them.
Language:C2.4k 52 407253
xtensor-stack/xsimd
C++ wrappers for SIMD intrinsics and parallelized, optimized mathematical functions (SSE, AVX, AVX512, NEON, SVE))
Language:C++2.2k 71 332258
ermig1979/Simd
C++ image processing and machine learning library with using of SIMD: SSE, AVX, AVX-512, AMX for x86/x64, NEON for ARM.
Language:C++2.1k 115 235414
kfrlib/kfr
Fast, modern C++ DSP framework, FFT, Sample Rate Conversion, FIR/IIR/Biquad Filters (SSE, AVX, AVX-512, ARM NEON)
Language:C++1.7k 63 188255
VcDevel/Vc
SIMD Vector Classes for C++
Language:C++1.5k 66 262151
p12tic/libsimdpp
Portable header-only C++ low level SIMD library
Language:C++1.2k 77 113129
SnellerInc/sneller
World's fastest log analysis: λ + SQL + JSON + S3
Language:Go1k 23 842
ashvardanian/SimSIMD
Up to 200x Faster Dot Products & Similarity Metrics — for Python, Rust, C, JS, and Swift, supporting f64, f32, f16 real & complex, i8, and bit vectors using SIMD for both AVX2, AVX-512, NEON, SVE, & SVE2 📐
Language:C993 18 9559
minio/sha256-simd
Accelerate SHA256 computations in pure Go using AVX512, SHA Extensions for x86 and ARM64 for ARM. On AVX512 it provides an up to 8x improvement (over 3 GB/s per core). SHA Extensions give a performance boost of close to 4x over native.
Language:Go984 37 37121
kimwalisch/primesieve
🚀 Fast prime number generator
Language:C++963 46 89123
intel/x86-simd-sort
C++ template library for high performance SIMD based sorting algorithms
Language:C++888 22 3759
libxsmm/libxsmm
Library for specialized dense and sparse matrix operations, and deep learning primitives.
Language:C850 51 343183
shibatch/sleef
SIMD Library for Evaluating Elementary Functions, vectorized libm and DFT
Language:C669 34 193132
VcDevel/std-simd
std::experimental::simd for GCC [ISO/IEC TS 19570:2018]
Language:C++580 22 3837
kimwalisch/libpopcnt
🚀 Fast C/C++ bit population count library
Language:C331 23 637
agenium-scale/nsimd
Agenium Scale vectorization library for CPUs and GPUs
Language:C328 27 6028
WojciechMula/sse-popcount
SIMD (SSE) population count --- http://0x80.pl/articles/sse-popcount.html
Language:C++326 32 849
WojciechMula/toys
Storage for my snippets, toy programs, etc.
Language:C++320 28 1642
kimwalisch/primecount
🚀 Fast prime counting function implementations
Language:C++311 23 4141
RRZE-HPC/OSACA
Open Source Architecture Code Analyzer
Language:Jupyter Notebook301 25 7220
powturbo/Turbo-Base64
Turbo Base64 - Fastest Base64 SIMD:SSE/AVX2/AVX512/Neon/Altivec - Faster than memcpy!
Language:C278 15 2141
WojciechMula/sse4-strstr
SIMD (SWAR/SSE/SSE4/AVX2/AVX512F/ARM Neon) of Karp-Rabin algorithm's modification
Language:C++241 24 1429
altimesh/hybridizer-basic-samples
Examples of C# code compiled to GPU by hybridizer
Language:C#237 23 7232
agenium-scale/boost.simd
Boost SIMD
232 32 15450
WojciechMula/base64-avx512
Code for paper "Base64 encoding and decoding at almost the speed of a memory copy"
Language:C199 15 38
minio/md5-simd
Accelerate aggregated MD5 hashing performance up to 8x for AVX512 and 4x for AVX2. Useful for server applications that need to compute many MD5 sums in parallel.
Language:Go176 11 918
manodeep/Corrfunc
⚡️⚡️⚡️Blazing fast correlation functions on the CPU.
Language:C167 11 20653
WojciechMula/base64simd
Base64 coding and decoding with SIMD instructions (SSE/AVX2/AVX512F/AVX512BW/AVX512VBMI/ARM Neon)
Language:C++157 17 714
animetosho/md5-optimisation
The fastest MD5 implementation using x86 assembly
Language:C++117 9 012
yzhaiustc/Optimizing-DGEMM-on-Intel-CPUs-with-AVX512F
Stepwise optimizations of DGEMM on CPU, reaching performance faster than Intel MKL eventually, even under multithreading.
Language:C112 4 122
cdl-saarland/rv
RV: A Unified Region Vectorizer for LLVM
Language:C++105 17 4415
intel/yask
YASK--Yet Another Stencil Kit: a domain-specific language and framework to create high-performance stencil code for implementing finite-difference methods and similar applications.
Language:C++104 17 9331

avx512

simdjson/simdjson

google/highway

HJLebbink/asm-dude

oneapi-src/oneDNN

simd-everywhere/simde

xtensor-stack/xsimd

ermig1979/Simd

kfrlib/kfr

VcDevel/Vc

p12tic/libsimdpp

SnellerInc/sneller

ashvardanian/SimSIMD

minio/sha256-simd

kimwalisch/primesieve

intel/x86-simd-sort

libxsmm/libxsmm

shibatch/sleef

VcDevel/std-simd

kimwalisch/libpopcnt

agenium-scale/nsimd

WojciechMula/sse-popcount

WojciechMula/toys

kimwalisch/primecount

RRZE-HPC/OSACA

powturbo/Turbo-Base64

WojciechMula/sse4-strstr

altimesh/hybridizer-basic-samples

agenium-scale/boost.simd

WojciechMula/base64-avx512

minio/md5-simd

manodeep/Corrfunc

WojciechMula/base64simd

animetosho/md5-optimisation

yzhaiustc/Optimizing-DGEMM-on-Intel-CPUs-with-AVX512F

cdl-saarland/rv

intel/yask