tugrul512bit
Physics engineer Web developer GPGPU Weird equations Optimizations Lens undistortion Signal filtering
Will code for foodEarth, Sun System, Milkyway Galaxy, Observable Universe
Pinned Repositories
Cekirdekler
Multi-device OpenCL kernel load balancer and pipeliner API for C#. Uses shared-distributed memory model to keep GPUs updated fast while using same kernel on all devices(for simplicity).
CekirdeklerCPP
Project(OpenCL 1.2 backend) to generate KutuphaneCL.dll to be used by Cekirdekler GPGPU API(Cekirdekler.dll)
CekirdeklerCPP2
OpenCL 2.0 support for Cekirdekler
FastCollisionDetectionLib
C++ adaptive grid for fast collision detection between AABB particles.
libGPGPU
Multi-GPU & CPU OpenCL kernel executor with load-balancing as if there is one big GPU.
LruClockCache
A low-latency LRU approximation cache in C++ using CLOCK second-chance algorithm. Multi level cache too. Up to 2.5 billion lookups per second.
TurtleSort
Multi-heap-sort for many small arrays, quicksort with 3 pivots for one big array, CUDA acceleration, CUDA memory compression.
UfSaCL
Ultra fast simulated annealing with OpenCL & multiple accelerators, GPUs, CPUs.
VectorizedKernel
Running GPGPU-like kernels on CPU with auto-vectorization for SSE/AVX/AVX512 SIMD Architectures
VirtualMultiArray
C++ virtual-array implementation that uses all graphics cards in system as storage (with LRU cache eviction on RAM) and uses OpenCL for data transfers. (Random access: faster than HDD) (Sequential access: faster than SSD) (big objects: faster than NVMe)
tugrul512bit's Repositories
tugrul512bit/Cekirdekler
Multi-device OpenCL kernel load balancer and pipeliner API for C#. Uses shared-distributed memory model to keep GPUs updated fast while using same kernel on all devices(for simplicity).
tugrul512bit/LruClockCache
A low-latency LRU approximation cache in C++ using CLOCK second-chance algorithm. Multi level cache too. Up to 2.5 billion lookups per second.
tugrul512bit/FastCollisionDetectionLib
C++ adaptive grid for fast collision detection between AABB particles.
tugrul512bit/VirtualMultiArray
C++ virtual-array implementation that uses all graphics cards in system as storage (with LRU cache eviction on RAM) and uses OpenCL for data transfers. (Random access: faster than HDD) (Sequential access: faster than SSD) (big objects: faster than NVMe)
tugrul512bit/libGPGPU
Multi-GPU & CPU OpenCL kernel executor with load-balancing as if there is one big GPU.
tugrul512bit/TurtleSort
Multi-heap-sort for many small arrays, quicksort with 3 pivots for one big array, CUDA acceleration, CUDA memory compression.
tugrul512bit/VectorizedKernel
Running GPGPU-like kernels on CPU with auto-vectorization for SSE/AVX/AVX512 SIMD Architectures
tugrul512bit/CekirdeklerCPP2
OpenCL 2.0 support for Cekirdekler
tugrul512bit/LruJS
Asynchronous cache that implements Least Recently Used (LRU) - Clock - Second Chance algorithm with O(1) hit O(1) miss complexity. This Async cache hides latency of cache-misses behind each other and behind cache-hits.
tugrul512bit/UfSaCL
Ultra fast simulated annealing with OpenCL & multiple accelerators, GPUs, CPUs.
tugrul512bit/unityTestMeshDeformation
deforming sphere surface using vertices, normals and time
tugrul512bit/CompressedStringLib
Heavy weight string with compression
tugrul512bit/KaloriferBenchmarkGPU
Async Test
tugrul512bit/AATPTPT
Gpu-accelerated The Powder Toy - just an attempt through cellular automata
tugrul512bit/Cuda_32kB_Dynamic_Register_Indexing
Accessing all private registers of a warp from main thread of warp.
tugrul512bit/FastaGeneIndexer
C++ compressed FASTA sequence cache backed by the combined video memory of system to decrease RAM usage.
tugrul512bit/SimpleFastVideoStreamCache
Simple (2 files), fast (1.8GB/s by 1 core of fx8150), video (mp4,ogg,..), stream cache (LRU implementation) for NodeJS.
tugrul512bit/SlothTree
Cuda accelerated tree-build, tree-traversal to check if a number is in an array.
tugrul512bit/AdvancedMacroDevices
2D RPG/RTS/Simulation game that lets you design a CPU & manage your corporation against other corporations.
tugrul512bit/cuda_bitonic_sort_test
testing bitonic sort algorithm on cuda
tugrul512bit/EpicWarCL
C# fully OpenCL(C99)-accelerated game demo and benchmark, prealpha- stage abondonware.
tugrul512bit/gpgpu-loadbalancerx
Simple load-balancing library for balancing GPGPU workloads between a GPU and a CPU or any number of devices in a computer or multiple computers.
tugrul512bit/InverseFX
Computing a function when only its inverse is known, using Newson-Raphson method for 1D,2D,3D arrays in parallel.
tugrul512bit/ParallelizedSnakeGame
Classic Snake-Game With Independent Grid-Updates For Efficient Parallelization And Constant Computation Time
tugrul512bit/examplesForLibGPGPU
Various simple algorithms accelerated with OpenCL (GPU). Assumes libGPGPU headers are added to projects.
tugrul512bit/FastSimpleNeuralNetworkTrainer
Gpu accelerated neural network trainer that supports multiple GPUs with OpenCL.
tugrul512bit/KalmanFilter
This is a Kalman filter used to calculate the angle, rate and bias from from the input of an accelerometer/magnetometer and a gyroscope.
tugrul512bit/oofrng
Thomas Wang's random number generation function implicitly parallelized & pipelined at speed of 0.6 cycles per 32bit integer.
tugrul512bit/tugrul512bit
User stats info.
tugrul512bit/VirtualMultiArrayForMSVC
Back your array of data by graphics card memory (multi-gpu) with a paging system (as a virtual memory simulation).