PointCloudLibrary/pcl

ABI problem - PCL_ENABLE_SSE changes alignment of class members

andreaskoepf opened this issue · 7 comments

I noticed in a test program on Ubuntu 16.04, gcc version 5.4.0 20160609 with pcl 1.8 (HEAD version) that a simple CropBox filter generated completely unpredictable results. It took me some hours to track this problem down:

I had built PCL with the default option PCL_ENABLE_SSE but my application was missing the SSE build flags, e.g. specifically -march=native -msse4.2 -mfpmath=sse. I simply used FIND_PACKAGE(PCL 1.8 REQUIRED) but that currently does not set/configure any SSE related options.

The reason for CropBox behaving unpredictable was uninitialized memory access. The PCL shared library and the calling application generated a slightly different class layout.

I added two debug outputs to see sizes and pointer-locations .. one in the ctor of CropBox in pcl/filters/crop_box.h (seen by my test app) and the other in CropBox::applyFilter in crop_box.hpp (which gets compiled into libpcl_filters.so).

Here is the output that I got:

Inside CropBox ctor:

this:140720843586960

Size infos:
sizeof(CropBox):240
sizeof(FilterIndices):104
sizeof(Filter):104
sizeof(PCLBase):48
sizeof(Affine3f):64
sizeof(Eigen::Vector4f):16
sizeof(Eigen::Vector3f):12

pointer values:
&min_pt_: 140720843587072
&max_pt_: 140720843587088
&translation_: 140720843587104
&rotation_: 140720843587116
&transform_: 140720843587136

Inside applyFilter:

this:140720843586960

Sizes:
sizeof(CropBox):256                 (* different!)
sizeof(FilterIndices):104
sizeof(Filter):104
sizeof(PCLBase):48
sizeof(Affine3f):64
sizeof(transform_):64
sizeof(Eigen::Vector4f):16
sizeof(Eigen::Vector3f):12

Member addresses:
&min_pt_: 140720843587072
&max_pt_: 140720843587088
&translation_: 140720843587104
&rotation_: 140720843587116
&transform_: 140720843587152    (* different!)

It seems to be very important to specify 'compatible' compiler options for a PCL client-application.
I wonder whether this problem is new in Unbuntu 16.04? Otherwise I would expect that a lot of people would have seen problems with the CropBox filter.

As a solution maybe PCLConfig.cmake could set a variable which contains the SSE settings that were used during the PCL build (e.g. ${SSE_FLAGS})?

Potentially related:

timjr commented

I can confirm -- I experienced the same issue on ubuntu 16.04 after compiling trunk PCL. My application's cropbox was returning 0 points. Recompiling the application after adding the SSE_FLAGS from PCL to its CMAKE_CXX_FLAGS fixed the issue.

Apparently the addition of -march=native -msse4.2 -mfpmath=sse to the build options of a client application has more implications than I thought. While my PCL related code now runs I get segfaults in ROS/MoveIt! ... both together work when I build the PCL with -dPCL_ENABLE_SSE:BOOL=FALSE.
I have not yet found the real source of the problem. When I see a crash it happens inside the Registration class during an ICP call. I noticed that the IterativeClosestPoint class has no EIGEN_MAKE_ALIGNED_OPERATOR_NEW macro - since it is derived from Registration which has Eigen members I had expected that derived classes should have the marco in a public section too. I tried to add those macros and recompiled but unfortunately that did not solve my segfaults. I also added traces to the IncrementalRegistration ctor and printed the this-pointec .. it was always at 16 byte aligned memory addresses.
If somebody has a better understanding what the gcc SSE flags change I would be glad to learn about it.

I am further investigating the crash in InterativeClosestPoint which I can repreduce on three independent machines with 16.04.

The first thing I that I now realize is that I have to look of 32 byte alignment since -march=native defines the __AVX__ macro which causes Eigen to use 32 byte alignment.

e.g. gcc -march=native -dM -E - < /dev/null | egrep "SSE|AVX" | sort output is on my machine:

#define __AVX__ 1
#define __AVX2__ 1
#define __SSE__ 1
#define __SSE2__ 1
#define __SSE2_MATH__ 1
#define __SSE3__ 1
#define __SSE4_1__ 1
#define __SSE4_2__ 1
#define __SSE_MATH__ 1
#define __SSSE3__ 1

Whether I add -msse4.2 or leave it away does not change these definitions - -march=native seems to be stronger.
Currently I suspect that the missing EIGEN_MAKE_ALIGNED_OPERATOR_NEW in icp.h can cause non 32 byte alignments. I can immediately trigger the segfault with a placement new. Unfortunately in a simple test program only allocating IterativeClosestPoint objects and calling align never caused the problem - new always returned a 32 byte aligned pointer. I will now try to figure out if there are situations (which I believe I saw before) where new does not return a 32 byte aligned pointer.

I found the root in my ICP crash: My shared library that contained the PCL ICP code was build without -fvisibility-inlines-hidden or -fvisibility=hidden (see visibility in gcc wiki). This caused the aligned_malloc function of Eigen to be exported as a "weak symbol" and its implementation was replaced by a different library that was not built with -march=native and returned only 16 byte aligned memory blocks. I guess adding at least -fvisibility-inlines-hidden to shared libs is a a good idea.

BTW I have to correct my comment about missing EIGEN_MAKE_ALIGNED_OPERATOR_NEW defines for the IterativeClosestPoint class. I was not aware that operator new is inherited in C++ - so everything is fine with the impl there.

Hi Andreas,

that sounds like an interesting problem 😃. To check if I understood the problems correctly:

Initially, the layout of a CropBox object in your binary was different than the layout of the CropBox object in the PCL library due to another alignment of the transform member. After recompiling PCL without SSE support, because there were issues compiling your program with -march=native -msse4.2 -mfpmath=sse in combination with other dependencies, this error was gone but you ended up with segfaults in the ICP class. The error disappeared because no special operations were used by Eigen, which in turn didn't require any fancy alignment.

Then you started using -march=nativeagain, which enabled AVX instructions and caused 32-bit alignment in Eigen allocators. And since you built PCL and your code with the same compiler commands, the error in CropBox didn't appear again, but therefore the ICP segfault.

Now, the cause for the ICP segfault is supposed to be PCL having been built with default visibility which allows aligned_malloc to be replaced by another library's aligned_malloc. I don't get this point, though. Why does default visibility make the Eigen symbol a weak symbol? I have to admit, that I just started reading about weak symbols and symbol visibility in gcc.

Given that, the fixes would be

  1. Include the SSE specific compiler flags to PCL_DEFINITIONS
  2. Take care of symbol visibility.

Actually, I had the impression by grepping through the CMake files, that PCL doesn't add definitions itself, but only adds the definitions of its dependencies.

I found a comment on symbol visibility in pcl_macros.h, but it seems unused. And I'm afraid that changing this would lead to major ABI changes because of disappearing symbols. So it would technically require a major version release.

I really think that someone else should have another look at this, like @taketwo or @SergioRAgostinho.

Best,
Stefan

+1. I just spent a couple of hours trying to debug a seemingly inexplicable segfault in pcl::FPFHEstimation with trunk PCL built using default settings. Adding "-march=native -msse4.2 -mfpmath=sse" to the compile flags of my application seem to have fixed the problem.

As @stefanbuettner noted before, this issue has two parts: a) SSE compiler flags; b) symbol visibility. The first part was already addressed by #1917 to a large extent. Now with #2100 we are making this even less error prone, so I consider that a) is solved. As for the b) part, we have a separate issue (#1913) to track it. I would thus propose to close this one.