Compatibility with Intel GPUs
haesleinhuepf opened this issue · 4 comments
Hi @maarzt ,
I'm just testing on Windows 10 / Intel UHD 620 and found some issues in this benchmark:
https://github.com/maarzt/imglib2-trainable-segmentation/blob/clij-benchmark/src/test/java/net/imglib2/trainable_segmention/performance/GpuCpuComparisonBenchmark.java
It crashes on my GPU with this error:
2:570:35: error: access qualifier can only be used for pipe and image type
__kernel void separable_operation(OUTPUT_IMAGE_PARAMETER, INPUT_IMAGE_PARAMETER, __constant float* kernelValues)
^
2:518:33: note: expanded from macro 'OUTPUT_IMAGE_PARAMETER'
#define OUTPUT_IMAGE_PARAMETER __write_only __global float* output
^
2:570:59: error: access qualifier can only be used for pipe and image type
__kernel void separable_operation(OUTPUT_IMAGE_PARAMETER, INPUT_IMAGE_PARAMETER, __constant float* kernelValues)
^
2:555:32: note: expanded from macro 'INPUT_IMAGE_PARAMETER'
#define INPUT_IMAGE_PARAMETER __read_only __global float* input
^
Error when trying to create kernel separable_operation
net.haesleinhuepf.clij.clearcl.exceptions.OpenCLException: OpenCL error: -45 -> CL_INVALID_PROGRAM_EXECUTABLE
at net.haesleinhuepf.clij.clearcl.backend.BackendUtils.checkOpenCLErrorCode(BackendUtils.java:352)
at net.haesleinhuepf.clij.clearcl.backend.jocl.ClearCLBackendJOCL.lambda$getKernelPeerPointer$19(ClearCLBackendJOCL.java:601)
at net.haesleinhuepf.clij.clearcl.backend.BackendUtils.checkExceptions(BackendUtils.java:156)
at net.haesleinhuepf.clij.clearcl.backend.jocl.ClearCLBackendJOCL.getKernelPeerPointer(ClearCLBackendJOCL.java:593)
at net.haesleinhuepf.clij.clearcl.ClearCLCompiledProgram.createKernel(ClearCLCompiledProgram.java:137)
at net.haesleinhuepf.clij.clearcl.ClearCLProgram.createKernel(ClearCLProgram.java:685)
at net.haesleinhuepf.clij.clearcl.util.CLKernelExecutor.getKernel(CLKernelExecutor.java:353)
at net.haesleinhuepf.clij.clearcl.util.CLKernelExecutor.enqueue(CLKernelExecutor.java:229)
at net.haesleinhuepf.clij2.CLIJ2.lambda$executeSubsequently$0(CLIJ2.java:236)
at net.haesleinhuepf.clij.clearcl.util.ElapsedTime.measure(ElapsedTime.java:97)
at net.haesleinhuepf.clij.clearcl.util.ElapsedTime.measure(ElapsedTime.java:28)
at net.haesleinhuepf.clij2.CLIJ2.executeSubsequently(CLIJ2.java:227)
at net.imglib2.trainable_segmention.gpu.api.DefaultGpuApi.lambda$execute$1(DefaultGpuApi.java:57)
at net.imglib2.trainable_segmention.gpu.api.DefaultGpuApi.handleOutOfMemoryException(DefaultGpuApi.java:65)
at net.imglib2.trainable_segmention.gpu.api.DefaultGpuApi.execute(DefaultGpuApi.java:56)
at net.imglib2.trainable_segmention.gpu.api.GpuScope.execute(GpuScope.java:42)
at net.imglib2.trainable_segmention.gpu.algorithms.GpuSeparableOperation.run(GpuSeparableOperation.java:76)
at net.imglib2.trainable_segmention.gpu.algorithms.GpuKernelConvolution.convolve(GpuKernelConvolution.java:61)
at net.imglib2.trainable_segmention.gpu.algorithms.GpuKernelConvolution.apply(GpuKernelConvolution.java:47)
at net.imglib2.trainable_segmention.gpu.algorithms.GpuConcatenatedNeighborhoodOperation.apply(GpuConcatenatedNeighborhoodOperation.java:49)
at net.imglib2.trainable_segmention.gpu.compute_cache.GpuGaussContent.load(GpuGaussContent.java:54)
at net.imglib2.trainable_segmention.gpu.compute_cache.GpuComputeCache$CacheEntry.get(GpuComputeCache.java:90)
at net.imglib2.trainable_segmention.gpu.compute_cache.GpuComputeCache.get(GpuComputeCache.java:51)
at net.imglib2.trainable_segmention.gpu.GpuFeatureInput.gauss(GpuFeatureInput.java:41)
at net.imglib2.trainable_segmention.pixel_feature.filter.gauss.SingleGaussianBlurFeature.apply(SingleGaussianBlurFeature.java:51)
at net.imglib2.trainable_segmention.pixel_feature.filter.FeatureJoiner.lambda$apply$2(FeatureJoiner.java:62)
at net.imglib2.trainable_segmention.pixel_feature.filter.FeatureJoiner.genericApply(FeatureJoiner.java:71)
at net.imglib2.trainable_segmention.pixel_feature.filter.FeatureJoiner.apply(FeatureJoiner.java:62)
at net.imglib2.trainable_segmention.pixel_feature.filter.AbstractGroupFeatureOp.apply(AbstractGroupFeatureOp.java:57)
at net.imglib2.trainable_segmention.pixel_feature.filter.FeatureJoiner.lambda$apply$2(FeatureJoiner.java:62)
at net.imglib2.trainable_segmention.pixel_feature.filter.FeatureJoiner.genericApply(FeatureJoiner.java:71)
at net.imglib2.trainable_segmention.pixel_feature.filter.FeatureJoiner.apply(FeatureJoiner.java:62)
at net.imglib2.trainable_segmention.pixel_feature.calculator.FeatureCalculator.applyUseGpu(FeatureCalculator.java:152)
at net.imglib2.trainable_segmention.classification.Segmenter.segmentGpu(Segmenter.java:129)
at net.imglib2.trainable_segmention.classification.Segmenter.segment(Segmenter.java:109)
at net.imglib2.trainable_segmention.performance.ParallelSegmentationTask.lambda$run$0(ParallelSegmentationTask.java:44)
at net.imglib2.cache.img.LoadedCellCacheLoader.get(LoadedCellCacheLoader.java:91)
at net.imglib2.cache.img.LoadedCellCacheLoader.get(LoadedCellCacheLoader.java:51)
at net.imglib2.cache.ref.SoftRefLoaderCache.get(SoftRefLoaderCache.java:101)
at net.imglib2.cache.util.LoaderCacheAsCacheAdapter.get(LoaderCacheAsCacheAdapter.java:30)
at net.imglib2.cache.util.CacheAsUncheckedCacheAdapter.get(CacheAsUncheckedCacheAdapter.java:28)
at net.imglib2.img.cell.LazyCellImg$LazyCells.get(LazyCellImg.java:104)
at net.imglib2.img.list.AbstractLongListImg$LongListRandomAccess.get(AbstractLongListImg.java:274)
at net.imglib2.img.cell.CellRandomAccess.getCell(CellRandomAccess.java:136)
at net.imglib2.img.cell.CellRandomAccess.updatePosition(CellRandomAccess.java:474)
at net.imglib2.img.cell.CellRandomAccess.<init>(CellRandomAccess.java:130)
at net.imglib2.img.cell.AbstractCellImg.randomAccess(AbstractCellImg.java:104)
at net.imglib2.trainable_segmention.Utils.lambda$populateCellImg$7(Utils.java:355)
at preview.net.imglib2.parallel.DefaultTaskExecutor.lambda$forEach$0(DefaultTaskExecutor.java:101)
at java.util.concurrent.ForkJoinTask$AdaptedCallable.exec(ForkJoinTask.java:1424)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
When removing the __write_only
and __read_only
from here execution of the benchmark hangs here for quite some time (minutes):
net.imglib2.trainable_segmention.performance.GpuCpuComparisonBenchmark
[WARNING] 1 exceptions occurred during plugin discovery.
log4j:WARN No appenders could be found for logger (org.bushe.swing.event.EventService).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
GPU: Intel(R) UHD Graphics 620
Time copying: 107.304 ms
Edit: It's not "hanging" - it just takes a minute or so before it continues...
Note, I had to make the OPENCL_DEVICE_NAME configurable in order to test this. It might make sense to make this available to end users as well:
https://github.com/haesleinhuepf/imglib2-trainable-segmentation/commit/d32ab1af07d0bd48ab9bcac28736f55231b3f91a
Let me know if I can help with further infos.
Cheers,
Robert
THANK YOU for testing this!
- I removed to __read_only and __write_only flags as you suggested.
- The "hanging" occurs for me too, some times, not a minute but still noticeable. So I will try to fix that first, and than I will ask you if the problem still occurs.
- Making the device name configurable would be great. Ideally it should be configurable in the UI. I'm undecided: What's the best place to do such Settings in ImageJ / Labkit?
- Making the device name configurable would be great. Ideally it should be configurable in the UI. I'm undecided: What's the best place to do such Settings in ImageJ / Labkit?
Does Labkit have its own config dialog? It would fit there, right? Next to the "Use GPU" checkbox ;-)
No Labkit doesn't have a config dialog yet. Is there an "ImageJ / scijava way" for config dialogs?