tugrul512bit/Cekirdekler
Multi-device OpenCL kernel load balancer and pipeliner API for C#. Uses shared-distributed memory model to keep GPUs updated fast while using same kernel on all devices(for simplicity).
C#GPL-3.0
Issues
- 0
- 6
- 38
Any of the opencl 2 version does not work
#54 opened by rajxabc - 4
- 5
- 9
1D NBODY scores
#51 opened by cmisztur - 0
add callback option to ClTask
#48 opened by tugrul512bit - 0
Lazy compute
#7 opened by tugrul512bit - 0
add duplicated compute option to device pool / task pool / task for initializing same buffer on all devices
#49 opened by tugrul512bit - 0
add task types to control pool behavior (sync, broadcast task, shutdown devices)
#50 opened by tugrul512bit - 0
add "batch mode compute"(pool of devices for pool of kernels) with multiple devices where each compute() is computed by 1 device only, with greedy scheduling
#45 opened by tugrul512bit - 0
array.nextParam(array2).task() ---> creates ClTask to compute later in pool, with all the fields set at that time but with the latest array data
#46 opened by tugrul512bit - 0
add multiple opencl-kernel instances for different compute-id values, for tiled computing, in task pool, with device pool
#47 opened by tugrul512bit - 0
single device pipeline: kernel repeat option
#44 opened by tugrul512bit - 0
- 0
kernel repeat count number and repeat-end function name(kernel) with 64 global size(auto) for each repeat
#28 opened by tugrul512bit - 1
ClArray.async to make an array copy operation done on another commandQueue(concurrently)
#41 opened by tugrul512bit - 0
clNumberCruncher.enqueueModeAsyncEnable to enqueue different kernels and arrays concurrently
#42 opened by tugrul512bit - 1
- 2
Read-only and write-only flags for ClArray
#39 opened by tugrul512bit - 3
nonPartialWrite capability for buffers
#37 opened by tugrul512bit - 0
Device to device pipeline: optimize single stage multiple kernel compute with less synchronizations
#35 opened by tugrul512bit - 3
Enqueue mode with single gpu (and for device to device pipeline) ---- lower latency per command
#38 opened by tugrul512bit - 0
Device to device pipeline: enable mixed ordering of kernel arrays (in kernel function definition)
#36 opened by tugrul512bit - 0
[canceled]Dynamic device to device pipeline
#33 opened by tugrul512bit - 0
Device to device pipeline: balancing load (kernel names) between neighboring stages
#34 opened by tugrul512bit - 0
- 0
- 0
Image decode+resize+multiple_encode pipeline
#32 opened by tugrul512bit - 0
- 0
Explicit Device to Device Pipelining
#8 opened by tugrul512bit - 0
Some helper methods into ClNumberCruncher
#30 opened by tugrul512bit - 0
Explicit Pipelining
#9 opened by tugrul512bit - 0
add struct array support with byte-length descriptors for Unity's Vector3-Vector2 arrays
#29 opened by tugrul512bit - 0
- 0
- 0
- 0
English language translation of cluster-computing related classes(multi-pc centered-control)
#25 opened by tugrul512bit - 0
Add device limits stress testing to have numbers used later in production or alarming when approaching limits.
#24 opened by tugrul512bit - 0
- 0
Arrays: bounds check before compute.
#20 opened by tugrul512bit - 0
- 0
For explicit device selection, ClNumberCruncher still expects number of cores and gpus
#19 opened by tugrul512bit - 0
inhibit use of ClDevice constructor
#18 opened by tugrul512bit - 0
- 0
- 1
Hide Unnecessary Methods and Classes
#10 opened by tugrul512bit - 0
C++ array wrapper re-creating(and computing) in loop throws error(CL_INVALID_MEM_OBJECT) but works for prepared N-array of C++ arrays
#14 opened by tugrul512bit - 0
- 0
Force multiple-of-64 for array size when using streaming and C++ arrays (cl_mem_use_host_ptr)
#11 opened by tugrul512bit