Use Travis CI Again
Opened this issue · 4 comments
CI was suspended due to issues documented in #180.
With the latest code, CI seems to be working on my test branch for linux builds using GCC 5 and 6.
The OSX build succeeds, but the test suite fails to run. I have determined that the CPU OpenCL driver on OSX wants to use a group size of 1 for pose estimation. Thus, when the current code is run, we get a Invalid Group Size
error.
Experimentally, I set the group size to 1 for OSX. Then the problem became a segmentation fault/ OUT OF RESOURCES error. At this point I switched to testing on my local machine. I was able to track the problem down to the serial sum. Since the parallel sum is essentially not running, the serial sum remains at a size of 307200. According to this, if a kernel takes too long to execute, it can throw the OUT OF RESOURCES error. This has also been experienced with the surface prediction kernel when setting the STEP_SIZE to be extremely small (ie. An OUT OF RESOURCES error is thrown). Both cases have long running loops in the kernel. I am unsure at this point how best to approach a workaround for this issue, or if we should partially abandon parts of the CI process on OSX (ie. build but don't test?).
For now, I will specify in the .travis.yml that the OSX build is allowed to fail while still passing the entire build. CI is still useful as a sanity check by running the tests on linux.
Also note: this does appear to be a bug in the CPU OpenCL implementation on OSX. I'm not sure what value @RobertLeahy was able to get from CL_DEVICE_MAX_WORK_GROUP_SIZE
, but I was able to confirm that the implementation wants to use a group size of 1 by using clGetKernelWorkGroupInfo with the CL_KERNEL_WORK_GROUP_SIZE
parameter.
Opening a new issue to track the problems on OSX