microsoft/onnxruntime-extensions

Ort::SessionOptions.EnableOrtCustomOps() causes memory leak

ns-wxin opened this issue · 5 comments

Hi, I'm getting memory leak with ASAN build when enabling custom Ops.
The code is straightforward. Any help would be appreciated on how to release the memory properly:

m_sessionOptions.SetIntraOpNumThreads(m_modelThreads);
m_sessionOptions.SetGraphOptimizationLevel(GraphOptimizationLevel::ORT_ENABLE_ALL);
m_sessionOptions.DisableCpuMemArena();

m_sessionOptions.EnableOrtCustomOps();
m_modelSession = Ort::Session(m_env, modelFile.c_str(), m_sessionOptions);

Here's the 2 leaks:

Direct leak of 56 byte(s) in 1 object(s) allocated from:
#0 0x7fbfdb814758 in operator new(unsigned long) ../../../../libsanitizer/asan/asan_new_delete.cpp:95
#1 0x7fbfcccd7d42 in OrtApis::CreateCustomOpDomain(char const*, OrtCustomOpDomain**) (/opt/3p/lib/libonnxruntime.so.1.12.1+0x624d42)
#2 0x7fbfcf3cbb16 in RegisterCustomOps (/opt/3p/lib/libonnxruntime.so.1.12.1+0x2d18b16)
#3 0x7fbfccce24f0 in OrtApis::EnableOrtCustomOps(OrtSessionOptions*) (/opt/3p/lib/libonnxruntime.so.1.12.1+0x62f4f0)
#4 0x605318 in Ort::SessionOptions::EnableOrtCustomOps() /opt/3p/include/onnxruntime/core/session/onnxruntime_cxx_inline.h:452
#5 0x605318 in fingerprint::TextMLFingerprintGenerator::loadModel(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) ../libs/dlp/fp20svc/src/TextMLFingerprintGenerator.cpp:88
#6 0x606d9a in fingerprint::TextMLFingerprintGenerator::TextMLFingerprintGenerator(std::__cxx11::basic_string<char, std::char_traits, std::allocator >, int) ../libs/dlp/fp20svc/src/TextMLFingerprintGenerator.cpp:64
#7 0x458605 in void std::_Construct<fingerprint::TextMLFingerprintGenerator, char const (&) [54], int>(fingerprint::TextMLFingerprintGenerator*, char const (&) [54], int&&) /opt/gcc-12/include/c++/12.2.0/bits/stl_construct.h:119
#8 0x458605 in void std::allocator_traits<std::allocator >::construct<fingerprint::TextMLFingerprintGenerator, char const (&) [54], int>(std::allocator&, fingerprint::TextMLFingerprintGenerator*, char const (&) [54], int&&) /opt/gcc-12/include/c++/12.2.0/bits/alloc_traits.h:635
#9 0x458605 in std::_Sp_counted_ptr_inplace<fingerprint::TextMLFingerprintGenerator, std::allocator, (__gnu_cxx::_Lock_policy)2>::_Sp_counted_ptr_inplace<char const (&) [54], int>(std::allocator, char const (&) [54], int&&) /opt/gcc-12/include/c++/12.2.0/bits/shared_ptr_base.h:604
#10 0x458605 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<fingerprint::TextMLFingerprintGenerator, std::allocator, char const (&) [54], int>(fingerprint::TextMLFingerprintGenerator*&, std::_Sp_alloc_shared_tag<std::allocator >, char const (&) [54], int&&) /opt/gcc-12/include/c++/12.2.0/bits/shared_ptr_base.h:971
#11 0x458605 in std::__shared_ptr<fingerprint::TextMLFingerprintGenerator, (__gnu_cxx::_Lock_policy)2>::__shared_ptr<std::allocator, char const (&) [54], int>(std::_Sp_alloc_shared_tag<std::allocator >, char const (&) [54], int&&) /opt/gcc-12/include/c++/12.2.0/bits/shared_ptr_base.h:1712
#12 0x458605 in std::shared_ptrfingerprint::TextMLFingerprintGenerator::shared_ptr<std::allocator, char const (&) [54], int>(std::_Sp_alloc_shared_tag<std::allocator >, char const (&) [54], int&&) /opt/gcc-12/include/c++/12.2.0/bits/shared_ptr.h:464
#13 0x458605 in std::shared_ptrfingerprint::TextMLFingerprintGenerator std::make_shared<fingerprint::TextMLFingerprintGenerator, char const (&) [54], int>(char const (&) [54], int&&) /opt/gcc-12/include/c++/12.2.0/bits/shared_ptr.h:1010
#14 0x44506d in __static_initialization_and_destruction_0 ../libs/dlp/fp20svc/test/src/DlpFp20Test.cpp:39
#15 0x6cc3dc in __libc_csu_init (/home/wxin/src/ns/dataplane/obj-asan/dlp_fp20_lib_test+0x6cc3dc)

Indirect leak of 256 byte(s) in 1 object(s) allocated from:
#0 0x7fbfdb814758 in operator new(unsigned long) ../../../../libsanitizer/asan/asan_new_delete.cpp:95
#1 0x7fbfcccfa1b2 in void std::vector<OrtCustomOp const*, std::allocator<OrtCustomOp const*> >::_M_realloc_insert<OrtCustomOp const*&>(__gnu_cxx::__normal_iterator<OrtCustomOp const**, std::vector<OrtCustomOp const*, std::allocator<OrtCustomOp const*> > >, OrtCustomOp const*&) (/opt/3p/lib/libonnxruntime.so.1.12.1+0x6471b2)
#2 0x7fbfcccfa72c in OrtApis::CustomOpDomain_Add(OrtCustomOpDomain*, OrtCustomOp const*) (/opt/3p/lib/libonnxruntime.so.1.12.1+0x64772c)
#3 0x7fbfcf3cc48f in RegisterCustomOps (/opt/3p/lib/libonnxruntime.so.1.12.1+0x2d1948f)
#4 0x7fbfccce24f0 in OrtApis::EnableOrtCustomOps(OrtSessionOptions*) (/opt/3p/lib/libonnxruntime.so.1.12.1+0x62f4f0)
#5 0x605318 in Ort::SessionOptions::EnableOrtCustomOps() /opt/3p/include/onnxruntime/core/session/onnxruntime_cxx_inline.h:452
#6 0x605318 in fingerprint::TextMLFingerprintGenerator::loadModel(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) ../libs/dlp/fp20svc/src/TextMLFingerprintGenerator.cpp:88
#7 0x606d9a in fingerprint::TextMLFingerprintGenerator::TextMLFingerprintGenerator(std::__cxx11::basic_string<char, std::char_traits, std::allocator >, int) ../libs/dlp/fp20svc/src/TextMLFingerprintGenerator.cpp:64
#8 0x458605 in void std::_Construct<fingerprint::TextMLFingerprintGenerator, char const (&) [54], int>(fingerprint::TextMLFingerprintGenerator*, char const (&) [54], int&&) /opt/gcc-12/include/c++/12.2.0/bits/stl_construct.h:119
#9 0x458605 in void std::allocator_traits<std::allocator >::construct<fingerprint::TextMLFingerprintGenerator, char const (&) [54], int>(std::allocator&, fingerprint::TextMLFingerprintGenerator*, char const (&) [54], int&&) /opt/gcc-12/include/c++/12.2.0/bits/alloc_traits.h:635
#10 0x458605 in std::_Sp_counted_ptr_inplace<fingerprint::TextMLFingerprintGenerator, std::allocator, (__gnu_cxx::_Lock_policy)2>::_Sp_counted_ptr_inplace<char const (&) [54], int>(std::allocator, char const (&) [54], int&&) /opt/gcc-12/include/c++/12.2.0/bits/shared_ptr_base.h:604
#11 0x458605 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<fingerprint::TextMLFingerprintGenerator, std::allocator, char const (&) [54], int>(fingerprint::TextMLFingerprintGenerator*&, std::_Sp_alloc_shared_tag<std::allocator >, char const (&) [54], int&&) /opt/gcc-12/include/c++/12.2.0/bits/shared_ptr_base.h:971
#12 0x458605 in std::__shared_ptr<fingerprint::TextMLFingerprintGenerator, (__gnu_cxx::_Lock_policy)2>::__shared_ptr<std::allocator, char const (&) [54], int>(std::_Sp_alloc_shared_tag<std::allocator >, char const (&) [54], int&&) /opt/gcc-12/include/c++/12.2.0/bits/shared_ptr_base.h:1712
#13 0x458605 in std::shared_ptrfingerprint::TextMLFingerprintGenerator::shared_ptr<std::allocator, char const (&) [54], int>(std::_Sp_alloc_shared_tag<std::allocator >, char const (&) [54], int&&) /opt/gcc-12/include/c++/12.2.0/bits/shared_ptr.h:464
#14 0x458605 in std::shared_ptrfingerprint::TextMLFingerprintGenerator std::make_shared<fingerprint::TextMLFingerprintGenerator, char const (&) [54], int>(char const (&) [54], int&&) /opt/gcc-12/include/c++/12.2.0/bits/shared_ptr.h:1010
#15 0x44506d in __static_initialization_and_destruction_0 ../libs/dlp/fp20svc/test/src/DlpFp20Test.cpp:39
#16 0x6cc3dc in __libc_csu_init (/home/wxin/src/ns/dataplane/obj-asan/dlp_fp20_lib_test+0x6cc3dc)

SUMMARY: AddressSanitizer: 312 byte(s) leaked in 2 allocation(s).
FAIL dlp_fp20_lib_test (exit status: 1)

the memory should be release here:

ort_api_->ReleaseCustomOpDomain(domain);

, which is static variable and it was freed at share library or executable unloading.

@wenbingl I assume you meant that piece of memory should have been released when shared lib or application exits? Our ASAN build should've taken care of that when running with gTest. Just to be clear that we shouldn't be specifically call session->release(), right? How do we further debug this issue?

By the way, we're using default-constructed session option which I don't think should be a problem:

// ORT environment object
Ort::Env m_env = Ort::Env(OrtLoggingLevel::ORT_LOGGING_LEVEL_INFO, "textML Encoder");

// ORT session option object
Ort::SessionOptions m_sessionOptions;

// ORT sesion object
Ort::Session m_modelSession{nullptr};

// allocator used in model session
Ort::AllocatorWithDefaultOptions m_allocator;

Here's the code snippet that does the inference:

Ort::Value text_tensor = Ort::Value::CreateTensor(m_allocator,
                                                  m_inputDims.data(),
                                                  m_inputDims.size(),
                                                  ONNX_TENSOR_ELEMENT_DATA_TYPE_STRING);
text_tensor.FillStringTensor(&inputText, 1U);

inputTensors.push_back(std::move(text_tensor));

outputTensors.emplace_back(Ort::Value::CreateTensor<float>(m_memoryInfo,
                                                           outputTensorValues.data(),
                                                           OutputTensorSize,
                                                           m_outputDims.data(),
                                                           m_outputDims.size()));

m_modelSession.Run(Ort::RunOptions{nullptr},
                   inputNames.data(),
                   inputTensors.data(),
                   1,
                   outputNames.data(),
                   outputTensors.data(),
                   1);

We just found out our version is different from your code (

ort_api_->ReleaseCustomOpDomain(domain);
). We are using Onnxruntime 1.12.1 in that repo, with "git submodule update", we pull in a compatible version of onnxruntime-extensions. Is there a way for us to get this "fix" w/o bumping up Onnxruntime version?

With this parameter, https://github.com/microsoft/onnxruntime/blob/75f6861cb8d138ef7265bc7967ae665551a0bbaa/tools/ci_build/build.py#L443
you can compile ort with any other version ort-extensions to replace the old one in the github submodules

Thank you. That's very helpful. Let me give it a try.