dji-sdk/Guidance-SDK-ROS

segmentation fault when running the ROS package

wuqibh opened this issue · 8 comments

segmentation fault occur 2 seconds after the ROS node begin running, and I have to restart the GUIDANCE to run the ROS node. What's the problem and anyone know?

Not sure how that happens. I just tried the package on a 32 bit Ubuntu 14.04 and things are just fine. You may try the new library again (remember to update the repo first as I fix a small bug which should be irrelevant) and post any useful messages if you still have that problem.

Even I am getting a segfault, though after a random amount of time - somewhere after 30 seconds to a minute. (64 bit, 14.04)
Running in gdb via roslaunch says:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffd67fc700 (LWP 16534)]
__memcpy_sse2_unaligned ()
        at ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S:33
33         ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S: No such file or directory.

Update : trying to check the exact cause in gdb but no luck yet..

I think this is related to this answer here http://stackoverflow.com/questions/17158740/memcpy-creates-segmentation-fault
and these lines(first one probably?):
https://github.com/dji-sdk/Guidance-SDK-ROS/blob/master/src/GuidanceNode.cpp#L53-L58

image_data data;
memcpy((char*)&data, content, sizeof(data));
memcpy(g_greyscale_image_left.data, data.m_greyscale_image_left[CAMERA_ID], IMAGE_SIZE);
memcpy(g_greyscale_image_right.data, data.m_greyscale_image_right[CAMERA_ID], IMAGE_SIZE);
memcpy(g_depth.data, data.m_depth_image[CAMERA_ID], IMAGE_SIZE * 2);

I am not completely certain here, but I think (char*)&data is the problem.
(char*)&data a pointer to an char, but is assigned to image_data, which is a struct declared here:

/**
*@struct image_data
*@brief Define image data structure. For each direction of stereo camera pair, the depth image aligns with the left greyscale image.
*/
typedef struct _image_data
{
unsigned int frame_index; /**< frame index */
unsigned int time_stamp; /**< time stamp of image captured in ms */
char *m_greyscale_image_left[CAMERA_PAIR_NUM]; /**< greyscale image of left camera */
char *m_greyscale_image_right[CAMERA_PAIR_NUM]; /**< greyscale image of right camera */
char *m_depth_image[CAMERA_PAIR_NUM]; /**< depth image in meters */
char *m_disparity_image[CAMERA_PAIR_NUM]; /**< disparity image in pixels */
}image_data;

But then there are similar typecasts in the SDK everywhere, so...

https://github.com/dji-sdk/Guidance-SDK/blob/2752333a183cb4454f7a4f016a24f48599b6059c/examples/usb_example/DJI_guidance_example/main.cpp#L69-L90 does a different type cast though, which is in a relevant if block as well

image_data* data = (image_data* )content;
...
if ( data->m_greyscale_image_left[d] ){
  g_greyscale_image_left[d] = Mat::zeros(HEIGHT,WIDTH,CV_8UC1);
  memcpy( g_greyscale_image_left[d].data, data->m_greyscale_image_left[d], IMAGE_SIZE );
 }

http://www.cplusplus.com/doc/tutorial/typecasting/ says such typecasts could lead to run time errors and unexpected behaviours.

Could this be a cause?

The problematic memcpy((char*)&data, content, sizeof(data)); is now removed from the ROS package.

Hey, thanks for the support
Is it? I don't see any commits. Or I am missing something or you're pushing soon?

This can be closed now I think. Was resolved ~2 weeks ago. bc793aa