RobertLeahy/KinFu

Use Travis CI Again

Opened this issue · 4 comments

CI was suspended due to issues documented in #180.

With the latest code, CI seems to be working on my test branch for linux builds using GCC 5 and 6.

The OSX build succeeds, but the test suite fails to run. I have determined that the CPU OpenCL driver on OSX wants to use a group size of 1 for pose estimation. Thus, when the current code is run, we get a Invalid Group Size error.

Experimentally, I set the group size to 1 for OSX. Then the problem became a segmentation fault/ OUT OF RESOURCES error. At this point I switched to testing on my local machine. I was able to track the problem down to the serial sum. Since the parallel sum is essentially not running, the serial sum remains at a size of 307200. According to this, if a kernel takes too long to execute, it can throw the OUT OF RESOURCES error. This has also been experienced with the surface prediction kernel when setting the STEP_SIZE to be extremely small (ie. An OUT OF RESOURCES error is thrown). Both cases have long running loops in the kernel. I am unsure at this point how best to approach a workaround for this issue, or if we should partially abandon parts of the CI process on OSX (ie. build but don't test?).

For now, I will specify in the .travis.yml that the OSX build is allowed to fail while still passing the entire build. CI is still useful as a sanity check by running the tests on linux.

Also note: this does appear to be a bug in the CPU OpenCL implementation on OSX. I'm not sure what value @RobertLeahy was able to get from CL_DEVICE_MAX_WORK_GROUP_SIZE, but I was able to confirm that the implementation wants to use a group size of 1 by using clGetKernelWorkGroupInfo with the CL_KERNEL_WORK_GROUP_SIZE parameter.

Opening a new issue to track the problems on OSX

The result of running on OSX with group size set to the CL_KERNEL_WORK_GROUP_SIZE ( =1)

image