tugrul512bit

Physics engineer Web developer GPGPU Weird equations Optimizations Lens undistortion Signal filtering

Will code for foodEarth, Sun System, Milkyway Galaxy, Observable Universe

Pinned Repositories

Cekirdekler
Multi-device OpenCL kernel load balancer and pipeliner API for C#. Uses shared-distributed memory model to keep GPUs updated fast while using same kernel on all devices(for simplicity).
Language:C#94 16 5610
CekirdeklerCPP
Project(OpenCL 1.2 backend) to generate KutuphaneCL.dll to be used by Cekirdekler GPGPU API(Cekirdekler.dll)
Language:C++7 4 02
CekirdeklerCPP2
OpenCL 2.0 support for Cekirdekler
Language:C++5 2 00
FastCollisionDetectionLib
C++ adaptive grid for fast collision detection between AABB particles.
Language:C++18 3 72
libGPGPU
Multi-GPU & CPU OpenCL kernel executor with load-balancing as if there is one big GPU.
Language:C++11 2 93
LruClockCache
A low-latency LRU approximation cache in C++ using CLOCK second-chance algorithm. Multi level cache too. Up to 2.5 billion lookups per second.
Language:C++68 4 15
TurtleSort
Multi-heap-sort for many small arrays, quicksort with 3 pivots for one big array, CUDA acceleration, CUDA memory compression.
Language:Cuda10 2 00
UfSaCL
Ultra fast simulated annealing with OpenCL & multiple accelerators, GPUs, CPUs.
Language:C++4 1 01
VectorizedKernel
Running GPGPU-like kernels on CPU with auto-vectorization for SSE/AVX/AVX512 SIMD Architectures
Language:C++9 2 01
VirtualMultiArray
C++ virtual-array implementation that uses all graphics cards in system as storage (with LRU cache eviction on RAM) and uses OpenCL for data transfers. (Random access: faster than HDD) (Sequential access: faster than SSD) (big objects: faster than NVMe)
Language:C++15 2 33

tugrul512bit's Repositories

tugrul512bit/Cekirdekler
Multi-device OpenCL kernel load balancer and pipeliner API for C#. Uses shared-distributed memory model to keep GPUs updated fast while using same kernel on all devices(for simplicity).
Language:C#94 16 5610
tugrul512bit/LruClockCache
A low-latency LRU approximation cache in C++ using CLOCK second-chance algorithm. Multi level cache too. Up to 2.5 billion lookups per second.
Language:C++68 4 15
tugrul512bit/FastCollisionDetectionLib
C++ adaptive grid for fast collision detection between AABB particles.
Language:C++18 3 72
tugrul512bit/VirtualMultiArray
C++ virtual-array implementation that uses all graphics cards in system as storage (with LRU cache eviction on RAM) and uses OpenCL for data transfers. (Random access: faster than HDD) (Sequential access: faster than SSD) (big objects: faster than NVMe)
Language:C++15 2 33
tugrul512bit/libGPGPU
Multi-GPU & CPU OpenCL kernel executor with load-balancing as if there is one big GPU.
Language:C++11 2 93
tugrul512bit/TurtleSort
Multi-heap-sort for many small arrays, quicksort with 3 pivots for one big array, CUDA acceleration, CUDA memory compression.
Language:Cuda10 2 00
tugrul512bit/VectorizedKernel
Running GPGPU-like kernels on CPU with auto-vectorization for SSE/AVX/AVX512 SIMD Architectures
Language:C++9 2 01
tugrul512bit/CekirdeklerCPP2
OpenCL 2.0 support for Cekirdekler
Language:C++5 2 00
tugrul512bit/LruJS
Asynchronous cache that implements Least Recently Used (LRU) - Clock - Second Chance algorithm with O(1) hit O(1) miss complexity. This Async cache hides latency of cache-misses behind each other and behind cache-hits.
Language:JavaScript5 2 01
tugrul512bit/UfSaCL
Ultra fast simulated annealing with OpenCL & multiple accelerators, GPUs, CPUs.
Language:C++4 1 01
tugrul512bit/unityTestMeshDeformation
deforming sphere surface using vertices, normals and time
Language:C#4 2 01
tugrul512bit/CompressedStringLib
Heavy weight string with compression
Language:C++3 2 10
tugrul512bit/KaloriferBenchmarkGPU
Async Test
3 1 10
tugrul512bit/AATPTPT
Gpu-accelerated The Powder Toy - just an attempt through cellular automata
Language:C++2 1 1
tugrul512bit/Cuda_32kB_Dynamic_Register_Indexing
Accessing all private registers of a warp from main thread of warp.
Language:Cuda2 2 0
tugrul512bit/FastaGeneIndexer
C++ compressed FASTA sequence cache backed by the combined video memory of system to decrease RAM usage.
Language:C++2 2 0
tugrul512bit/SimpleFastVideoStreamCache
Simple (2 files), fast (1.8GB/s by 1 core of fx8150), video (mp4,ogg,..), stream cache (LRU implementation) for NodeJS.
Language:JavaScript2 2 0
tugrul512bit/SlothTree
Cuda accelerated tree-build, tree-traversal to check if a number is in an array.
Language:Cuda2 1 0
tugrul512bit/AdvancedMacroDevices
2D RPG/RTS/Simulation game that lets you design a CPU & manage your corporation against other corporations.
Language:C++1 1 3
tugrul512bit/cuda_bitonic_sort_test
testing bitonic sort algorithm on cuda
Language:Cuda1 2 01
tugrul512bit/EpicWarCL
C# fully OpenCL(C99)-accelerated game demo and benchmark, prealpha- stage abondonware.
Language:C1 2 0
tugrul512bit/gpgpu-loadbalancerx
Simple load-balancing library for balancing GPGPU workloads between a GPU and a CPU or any number of devices in a computer or multiple computers.
Language:C++1 3 01
tugrul512bit/InverseFX
Computing a function when only its inverse is known, using Newson-Raphson method for 1D,2D,3D arrays in parallel.
Language:C++1 2 21
tugrul512bit/ParallelizedSnakeGame
Classic Snake-Game With Independent Grid-Updates For Efficient Parallelization And Constant Computation Time
Language:C++1 2 0
tugrul512bit/examplesForLibGPGPU
Various simple algorithms accelerated with OpenCL (GPU). Assumes libGPGPU headers are added to projects.
Language:C++
tugrul512bit/FastSimpleNeuralNetworkTrainer
Gpu accelerated neural network trainer that supports multiple GPUs with OpenCL.
Language:C++1 11
tugrul512bit/KalmanFilter
This is a Kalman filter used to calculate the angle, rate and bias from from the input of an accelerometer/magnetometer and a gyroscope.
tugrul512bit/oofrng
Thomas Wang's random number generation function implicitly parallelized & pipelined at speed of 0.6 cycles per 32bit integer.
Language:C++2 0
tugrul512bit/tugrul512bit
User stats info.
2 0
tugrul512bit/VirtualMultiArrayForMSVC
Back your array of data by graphics card memory (multi-gpu) with a paging system (as a virtual memory simulation).
Language:C++1 01