Pinned Repositories
CLATCH
Insanely fast CUDA LATCH: fully scale- and rotation-invariant 512-bit binary descriptor for computer vision
CRC
Fastest CRC32 for x86, Intel and AMD, + comprehensive derivation and discussion of various approaches
CUDAKfNN
Fastest CUDA SIFT or other 128-float vector matcher for computer vision
FastArrayOps
Extremely fast x86 / AVX2 assembly implementations of common operations for linear arrays: checking whether array contains element, finding index of element, finding min/max element, finding index of min/max element.
KFAST
Implementation of FAST feature detector for computer vision (Rosten 2006) using AVX2 to outperform canonical implementation by up to 600%.
KORAL
Novel extreme-performance CPU-GPU cooperative feature detector-descriptor for computer vision.
KPS
Infrastructure for simultaneous orbital and attitude propagation, with attitude-based real-time analytical aerodynamics simulation
LATCH
Fastest CPU implementation of the LATCH 512-bit binary feature descriptor; fully scale- and rotation-invariant
RGB2Y
Fastest CPU (AVX/SSE) RGB to grayscale: 2-4x faster than OpenCV. For image processing/computer vision.
SortingNetworks
Fastest CPU SIMD (SSE4) sorting networks for small integer arrays (2-6 elements), also optimal amd64 assembly and notes on getting compilers to generate optimal sorting networks.
komrad36's Repositories
komrad36/CRC
Fastest CRC32 for x86, Intel and AMD, + comprehensive derivation and discussion of various approaches
komrad36/RGB2Y
Fastest CPU (AVX/SSE) RGB to grayscale: 2-4x faster than OpenCV. For image processing/computer vision.
komrad36/KFAST
Implementation of FAST feature detector for computer vision (Rosten 2006) using AVX2 to outperform canonical implementation by up to 600%.
komrad36/SortingNetworks
Fastest CPU SIMD (SSE4) sorting networks for small integer arrays (2-6 elements), also optimal amd64 assembly and notes on getting compilers to generate optimal sorting networks.
komrad36/FastArrayOps
Extremely fast x86 / AVX2 assembly implementations of common operations for linear arrays: checking whether array contains element, finding index of element, finding min/max element, finding index of min/max element.
komrad36/CLATCH
Insanely fast CUDA LATCH: fully scale- and rotation-invariant 512-bit binary descriptor for computer vision
komrad36/FastDivide
Divide 64-bit integers faster than hardware. Or precompute for a given denom and quickly divide repeatedly.
komrad36/KPS
Infrastructure for simultaneous orbital and attitude propagation, with attitude-based real-time analytical aerodynamics simulation
komrad36/KfNN
Fastest CPU (AVX/SSE) SIFT or other 128-float vector matcher for computer vision
komrad36/CUDALERP
Fast CUDA (GPU) Bilinear and Nearest-Neighbor Interpolation at high accuracy - uint8_t data
komrad36/CUDAFLERP
Fast CUDA (GPU) Bilinear and Nearest-Neighbor Interpolation at high accuracy - float32 data
komrad36/FastIntegerSqrt
Fastest implementations of 32-bit and 64-bit integer square roots for x86-64
komrad36/FastThreadPool
Fast lock-free thread pool
komrad36/UCLATCH
Insanely fast CUDA LATCH 512-bit binary descriptor for computer vision (upright)
komrad36/BitOps
Basic, efficient, header-only bit ops and bit array primitives for modern x86. Tests provided.
komrad36/popcount
Fastest possible x86 implementation of popcount/population count/Hamming weight/counting set bits
komrad36/EllipticCurveFactorization
Fast, single-file, MIT-licensed large integer factorization using ECM combined with other techniques.
komrad36/MemoryOrder
Demos of 3 ways even the strong memory model of x86 can exhibit architectural memory reordering, leading to bugs
komrad36/PrimeSieve
Super fast, dynamically expanding prime sieve for primality queries, forward or backward iteration
komrad36/UnsignedIntegralToFloatingPoint
Notes on fast standards-compliant conversion of U32/U64 to and from float/double, which compilers do not get right.
komrad36/Factorization-Primality
Extremely fast, single-file factorization and primality testing for 32-bit and 64-bit integers on x86.
komrad36/ModularSqrt
Fast modular square root of primes and prime powers, including 2. Interface uses GMP bigints.
komrad36/SMC-Demo
Minimal demo of self-modifying code on Windows. Still doable, still useful.
komrad36/CudaBoids
Numerical simulation of flocking behavior using CUDA and OpenGL
komrad36/Schematic
Basic toy Lisp interpreter in a few hundred lines of C++.
komrad36/SingleLinePythonSudoku
Single-line Python Sudoku solver
komrad36/SolveModularQuadratic
Generate all solutions to a modular quadratic equation. Supports any modulus. Interface uses GMP bigints.
komrad36/Sudoku
Fast sudoku solver with detection of no solution/single solution/multiple solutions/invalid initial board
komrad36/Leftpack
Fast AVX2 leftpack/compress implementations (keep and contiguously pack a subset of elements)
komrad36/Bheap
Lightweight binary heap that greatly outperforms std::priority_queue and other commonly available heap implementations