tugrul512bit/Cekirdekler

Multi-device OpenCL kernel load balancer and pipeliner API for C#. Uses shared-distributed memory model to keep GPUs updated fast while using same kernel on all devices(for simplicity).

C#GPL-3.0

Issues

Mandelbrot benchmark's or other test's source
#56 opened 2 years ago by PascalSoftwares
0
How to share Big Array, like a lookup table among various kernel calls
#55 opened 5 years ago by rajxabc
6
Any of the opencl 2 version does not work
#54 opened 5 years ago by rajxabc
38
Is there an example of generating a Unity Texture?
#53 opened 5 years ago by mfagerlund
4
Can you set pipeline mode for each device separately?
#52 opened 6 years ago by jinxiu0406
5
1D NBODY scores
#51 opened 7 years ago by cmisztur
9
add callback option to ClTask
#48 opened 7 years ago by tugrul512bit
0
Lazy compute
#7 opened 7 years ago by tugrul512bit
0
add duplicated compute option to device pool / task pool / task for initializing same buffer on all devices
#49 opened 7 years ago by tugrul512bit
0
add task types to control pool behavior (sync, broadcast task, shutdown devices)
#50 opened 7 years ago by tugrul512bit
0
add "batch mode compute"(pool of devices for pool of kernels) with multiple devices where each compute() is computed by 1 device only, with greedy scheduling
#45 opened 7 years ago by tugrul512bit
0
array.nextParam(array2).task() ---> creates ClTask to compute later in pool, with all the fields set at that time but with the latest array data
#46 opened 7 years ago by tugrul512bit
0
add multiple opencl-kernel instances for different compute-id values, for tiled computing, in task pool, with device pool
#47 opened 7 years ago by tugrul512bit
0
single device pipeline: kernel repeat option
#44 opened 7 years ago by tugrul512bit
0
single device pipeline: overlapping regions percentage in total latency
#43 opened 7 years ago by tugrul512bit
0
kernel repeat count number and repeat-end function name(kernel) with 64 global size(auto) for each repeat
#28 opened 7 years ago by tugrul512bit
0
ClArray.async to make an array copy operation done on another commandQueue(concurrently)
#41 opened 7 years ago by tugrul512bit
1
clNumberCruncher.enqueueModeAsyncEnable to enqueue different kernels and arrays concurrently
#42 opened 7 years ago by tugrul512bit
0
ClArray.name to bind an array to a kernel parameter with exact spelling
#40 opened 7 years ago by tugrul512bit
1
Read-only and write-only flags for ClArray
#39 opened 7 years ago by tugrul512bit
2
nonPartialWrite capability for buffers
#37 opened 7 years ago by tugrul512bit
3
Device to device pipeline: optimize single stage multiple kernel compute with less synchronizations
#35 opened 7 years ago by tugrul512bit
0
Enqueue mode with single gpu (and for device to device pipeline) ---- lower latency per command
#38 opened 7 years ago by tugrul512bit
3
Device to device pipeline: enable mixed ordering of kernel arrays (in kernel function definition)
#36 opened 7 years ago by tugrul512bit
0
[canceled]Dynamic device to device pipeline
#33 opened 7 years ago by tugrul512bit
0
Device to device pipeline: balancing load (kernel names) between neighboring stages
#34 opened 7 years ago by tugrul512bit
0
add built-in image-resizing method for png,gif and jpeg
#23 opened 7 years ago by tugrul512bit
0
Add built-in jpeg,gif,png decompression-recompression methods
#22 opened 7 years ago by tugrul512bit
0
Image decode+resize+multiple_encode pipeline
#32 opened 7 years ago by tugrul512bit
0
Complete device to device pipeline stage initialization kernel execution
#31 opened 7 years ago by tugrul512bit
0
Explicit Device to Device Pipelining
#8 opened 7 years ago by tugrul512bit
0
Some helper methods into ClNumberCruncher
#30 opened 7 years ago by tugrul512bit
0
Explicit Pipelining
#9 opened 7 years ago by tugrul512bit
0
add struct array support with byte-length descriptors for Unity's Vector3-Vector2 arrays
#29 opened 7 years ago by tugrul512bit
0
add built-in matrix multiplication with sizes between 2x2 and 8192x8192
#27 opened 7 years ago by tugrul512bit
0
Nbody benchmark-based explicit device selection
#16 opened 7 years ago by tugrul512bit
0
nbody(benchmark based) device selection disposes shared platform
#26 opened 7 years ago by tugrul512bit
0
English language translation of cluster-computing related classes(multi-pc centered-control)
#25 opened 7 years ago by tugrul512bit
0
Add device limits stress testing to have numbers used later in production or alarming when approaching limits.
#24 opened 7 years ago by tugrul512bit
0
Add speed-ratio indicator between devices after 10-20 iterations
#21 opened 7 years ago by tugrul512bit
0
Arrays: bounds check before compute.
#20 opened 7 years ago by tugrul512bit
0
Workitems: Grain size - local size - global size: bounds check
#17 opened 7 years ago by tugrul512bit
0
For explicit device selection, ClNumberCruncher still expects number of cores and gpus
#19 opened 7 years ago by tugrul512bit
0
inhibit use of ClDevice constructor
#18 opened 7 years ago by tugrul512bit
0
Explicit device selection disposes handles twice, giving error
#15 opened 7 years ago by tugrul512bit
0
Redefine properties that are with underscores, to have a proper naming
#12 opened 7 years ago by tugrul512bit
0
Hide Unnecessary Methods and Classes
#10 opened 7 years ago by tugrul512bit
1
C++ array wrapper re-creating(and computing) in loop throws error(CL_INVALID_MEM_OBJECT) but works for prepared N-array of C++ arrays
#14 opened 7 years ago by tugrul512bit
0
Disposing unused buffers with warning message
#13 opened 7 years ago by tugrul512bit
0
Force multiple-of-64 for array size when using streaming and C++ arrays (cl_mem_use_host_ptr)
#11 opened 7 years ago by tugrul512bit
0