GPUOpen-Archive/CodeXL

OpenCL error -30 (CL_INVALID_VALUE) for every kernel launched in CodeXL

Grumpy-Dwarf opened this issue · 6 comments

I have Ubuntu 18.04. Driver is amdgpu-pro 18.30, tested on some other driver versions with same result. Using latest release of CodeXL 2.6.302. Have three GPUs in system - Intel, Nvidia and AMD RX 580. Installed only legacy headless OpenCL from amdgpu-pro. I'm trying to see OpenCL application timeline trace. Application itself runs flawlessly. Then I run it from CodeXL and first clEnqueueNDRangeKernel returns -30. Tested several applications, all with same result.
Now I run application using /opt/CodeXL_2.6-302/rcprof -t and it generates timeline trace which I put into CodeXL session. Now CodeXL recognizes it and I can see timeline trace. Everything works as expected without errors.

What can I do? rcprof generates correct .atp file but always have CL_INVALID_VALUE when run from inside CodeXL. AFAIK CodeXL just runs rcprof but looks like things are more complicated.

The only thing I can think of is a difference in the working directory when runnning from the UI. Can you check the "Working Directoy" setting in the Project Settings?

Also, can you check the .atp file generated to see if there are any earlier failed API calls (or are the clEnqueueNDRangeKernel calls the only ones reporting an error)?

Also, the CodeXL log file should show the full command line passed to rcprof. It would be interesting to know if you also see this problem if using the same command line when running rcprof manually.

You many need to turn up the log level to see this (I can't remember off the top of my head). You can adjust the log level and find the location of the log file in the Tools->Options dialog box

Thanks a lot! I found out what options are passed to rcprof and the one which broke kernel launch was --occupancy. Unchecked "Generate occupancy information for each OpenCL or HSA kernels provided" and now API trace works.

Glad you have a workaround. And thanks for sharing this. I see what looks like a bug in the occupancy code that might explain why you are seeing this.

One additional thing that would help further. Can you add the following code to your code and let me know what value is returned? It looks like a failure in this call may be the root cause of the failure you're seeing.

#ifndef CL_DEVICE_GFXIP_MAJOR_AMD
#define CL_DEVICE_GFXIP_MAJOR_AMD 0x404A
#endif
cl_uint gfxIpMajor = 0;
cl_int retVal = clGetDeviceInfo(device, CL_DEVICE_GFXIP_MAJOR_AMD, sizeof(cl_uint), &gfxIpMajor, NULL);

It would be good to know what the return code is as well as the value of gfxIpMajor.

Just tested two cases - with occupancy enabled and occupancy disabled. retVal is zero, gfxIpMajor is 8 for both of them.