komrad36

Speaks assembly, will travel.

Santa Monica, CA

Pinned Repositories

CLATCH
Insanely fast CUDA LATCH: fully scale- and rotation-invariant 512-bit binary descriptor for computer vision
Language:C++32 4 26
CRC
Fastest CRC32 for x86, Intel and AMD, + comprehensive derivation and discussion of various approaches
Language:C++263 13 323
CUDAKfNN
Fastest CUDA SIFT or other 128-float vector matcher for computer vision
Language:C++25 2 06
FastArrayOps
Extremely fast x86 / AVX2 assembly implementations of common operations for linear arrays: checking whether array contains element, finding index of element, finding min/max element, finding index of min/max element.
Language:Assembly37 3 02
KFAST
Implementation of FAST feature detector for computer vision (Rosten 2006) using AVX2 to outperform canonical implementation by up to 600%.
Language:C79 6 121
KORAL
Novel extreme-performance CPU-GPU cooperative feature detector-descriptor for computer vision.
Language:C++38 6 314
KPS
Infrastructure for simultaneous orbital and attitude propagation, with attitude-based real-time analytical aerodynamics simulation
Language:C++23 6 27
LATCH
Fastest CPU implementation of the LATCH 512-bit binary feature descriptor; fully scale- and rotation-invariant
Language:C++34 3 912
RGB2Y
Fastest CPU (AVX/SSE) RGB to grayscale: 2-4x faster than OpenCV. For image processing/computer vision.
Language:C++89 11 422
SortingNetworks
Fastest CPU SIMD (SSE4) sorting networks for small integer arrays (2-6 elements), also optimal amd64 assembly and notes on getting compilers to generate optimal sorting networks.
Language:Assembly44 6 15

komrad36's Repositories

komrad36/CRC
Fastest CRC32 for x86, Intel and AMD, + comprehensive derivation and discussion of various approaches
Language:C++263 13 323
komrad36/RGB2Y
Fastest CPU (AVX/SSE) RGB to grayscale: 2-4x faster than OpenCV. For image processing/computer vision.
Language:C++89 11 422
komrad36/KFAST
Implementation of FAST feature detector for computer vision (Rosten 2006) using AVX2 to outperform canonical implementation by up to 600%.
Language:C79 6 121
komrad36/SortingNetworks
Fastest CPU SIMD (SSE4) sorting networks for small integer arrays (2-6 elements), also optimal amd64 assembly and notes on getting compilers to generate optimal sorting networks.
Language:Assembly44 6 15
komrad36/FastArrayOps
Extremely fast x86 / AVX2 assembly implementations of common operations for linear arrays: checking whether array contains element, finding index of element, finding min/max element, finding index of min/max element.
Language:Assembly37 3 02
komrad36/CLATCH
Insanely fast CUDA LATCH: fully scale- and rotation-invariant 512-bit binary descriptor for computer vision
Language:C++32 4 26
komrad36/FastDivide
Divide 64-bit integers faster than hardware. Or precompute for a given denom and quickly divide repeatedly.
Language:C++26 2 13
komrad36/KPS
Infrastructure for simultaneous orbital and attitude propagation, with attitude-based real-time analytical aerodynamics simulation
Language:C++23 6 27
komrad36/KfNN
Fastest CPU (AVX/SSE) SIFT or other 128-float vector matcher for computer vision
Language:C++13 2 05
komrad36/CUDALERP
Fast CUDA (GPU) Bilinear and Nearest-Neighbor Interpolation at high accuracy - uint8_t data
Language:C++12 3 12
komrad36/CUDAFLERP
Fast CUDA (GPU) Bilinear and Nearest-Neighbor Interpolation at high accuracy - float32 data
Language:C++9 3 25
komrad36/FastIntegerSqrt
Fastest implementations of 32-bit and 64-bit integer square roots for x86-64
Language:C++8 2 01
komrad36/FastThreadPool
Fast lock-free thread pool
Language:C++8 2 02
komrad36/UCLATCH
Insanely fast CUDA LATCH 512-bit binary descriptor for computer vision (upright)
Language:C++8 2 03
komrad36/BitOps
Basic, efficient, header-only bit ops and bit array primitives for modern x86. Tests provided.
Language:C++6 2 02
komrad36/popcount
Fastest possible x86 implementation of popcount/population count/Hamming weight/counting set bits
Language:C++6 3 02
komrad36/EllipticCurveFactorization
Fast, single-file, MIT-licensed large integer factorization using ECM combined with other techniques.
Language:C++5 2 11
komrad36/MemoryOrder
Demos of 3 ways even the strong memory model of x86 can exhibit architectural memory reordering, leading to bugs
Language:C++5 3 03
komrad36/PrimeSieve
Super fast, dynamically expanding prime sieve for primality queries, forward or backward iteration
Language:C++5 2 02
komrad36/UnsignedIntegralToFloatingPoint
Notes on fast standards-compliant conversion of U32/U64 to and from float/double, which compilers do not get right.
4 2 01
komrad36/Factorization-Primality
Extremely fast, single-file factorization and primality testing for 32-bit and 64-bit integers on x86.
Language:C++3 2 03
komrad36/ModularSqrt
Fast modular square root of primes and prime powers, including 2. Interface uses GMP bigints.
Language:C++3 2 03
komrad36/SMC-Demo
Minimal demo of self-modifying code on Windows. Still doable, still useful.
Language:Assembly3 3 01
komrad36/CudaBoids
Numerical simulation of flocking behavior using CUDA and OpenGL
Language:Cuda2 2 01
komrad36/Schematic
Basic toy Lisp interpreter in a few hundred lines of C++.
Language:C++2 2 01
komrad36/SingleLinePythonSudoku
Single-line Python Sudoku solver
2 2 0
komrad36/SolveModularQuadratic
Generate all solutions to a modular quadratic equation. Supports any modulus. Interface uses GMP bigints.
Language:C++2 2 01
komrad36/Sudoku
Fast sudoku solver with detection of no solution/single solution/multiple solutions/invalid initial board
Language:C++2 2 0
komrad36/Leftpack
Fast AVX2 leftpack/compress implementations (keep and contiguously pack a subset of elements)
Language:C++1 2 01
komrad36/Bheap
Lightweight binary heap that greatly outperforms std::priority_queue and other commonly available heap implementations
Language:C++2 01