ADLINK-IST/opensplice

pthread create failed in code compiling DDS and CUDA

Opened this issue · 1 comments

Hello

I wrote a code using DDS and CUDA. In this code, I use VortexOpenSplice (6.9.2p1 HDE) and library cudaarithm ( from CUDA 11.8 and openCV 4.6.0; openCV is compiled with cuda 11.8). My platform is Ubuntu 22.04.3 LTS code name jammy; arch 64.
Compilation is ok. But execution of the code terminates at the first instance of any dds object. Here an example in ospl error log :

========================================================================================
Context     : dds:domain::DomainParticipant::DomainParticipant
Date        : 2023-11-22T14:56:21+0100
Node        : safran-Nuvo-8108GC-Series
Process     : CUDA_DDS_exec <1109311>
Thread      : main thread 1533ec82a000
Internals   : DomainParticipantDelegate.cpp/90/6.9.2p1/283dd7d/1ad4fef/-1
----------------------------------------------------------------------------------------
Report      : Error: Failed to create DomainParticipant
Internals   : org::opensplice::domain::DomainParticipantDelegate::DomainParticipantDelegate/DomainParticipantDelegate.cpp/90/1/1700661381.537102916
----------------------------------------------------------------------------------------
Report      : Unable to connect to domain id = 0.
              The most common causes of this error are an incorrect configuration file or
              that OpenSpliceDDS is not running (when using shared memory mode).
Internals   : u_participantNew/u_participant.c/234/773/1700661381.537121441
----------------------------------------------------------------------------------------
Report      : Failed to start Spliced for domain 'ospl_sp_ddsi' within 0 seconds, result = U_RESULT_INTERNAL_ERROR

Internals   : user::u_domain::startSpliceThread/u_domain.c/1251/773/1700661381.537133310
----------------------------------------------------------------------------------------
Report      : Error starting thread for 'spliced'

Internals   : user::u_domain::startSplicedInProcess/u_domain.c/736/773/1700661381.537145367
----------------------------------------------------------------------------------------
Report      : pthread_create failed with error 22 (spliced)
Internals   : os_threadCreate/os_thread.c/543/2/1700661381.537156696

The code works if the cv::cuda method is commented (no need to modify the associated cmake). Domain is well created.

The code is very simple :

#include <dds/dds.hpp>
#include <opencv2/cudaarithm.hpp>

static void wrap_substract(cv::InputArray i_src1, cv::InputArray i_src2, cv::OutputArray o_dst, cv::InputArray i_mask, int i_dtype){
     cv::cuda::subtract(i_src1,i_src2,o_dst,i_mask,i_dtype);
}

int main(int argc, char **argv){
    uint32_t nbDomainParticipant = 0;
    dds::domain::DomainParticipant dParticpant_(nbDomainParticipant);
    return 0;
}

Of course, domaine participant value is correct in regards of the ospl config xml. Commenting "wrap_substract" is enough to make working the code.
I attached compressed files with the code and the associated cmake:
CUDA_DDS.zip

Unfortunately, it is really complicated to compile this code because that needs a complete CUDA and Open CV setting on the platform. That can take hours or days to install these packages and then to reproduce that issue. But my point is to know if anybody have pieces of idea about the origin of that failure.
I have already tested the following ideas :

  • if the wrapper of the cv::cuda method is declared inline, the execution is correct.
  • if the openCV is compilated in a static library, this code works.
    But these solutions do not work in a more complicated code (that i can't show). I have to find a sustainable solution for my big project.
    The issue looks to be linked to the heap memory, like there is a memory corruption caused by a global variable in cuda library. But I don't know how to reveal that kind of interference, nor if my idea is correct.

Thanks.

hi @cvdcvd , this project is not actively maintained anymore please take a look at it successor CycloneDDS