catboost/catboost

mimalloc hijacks alloc/free outside of just catboost

mikekap opened this issue · 6 comments

Problem: The linked-in mimalloc replaces malloc/free throughout the entire program you're running. This is an issue when catboost is dynamically loaded - e.g. via JNI.
catboost version: 1.2.2
Operating System: OSX 14.3.1 x86_64
Java version:

openjdk version "17.0.10" 2024-01-16
OpenJDK Runtime Environment Temurin-17.0.10+7 (build 17.0.10+7)
OpenJDK 64-Bit Server VM Temurin-17.0.10+7 (build 17.0.10+7, mixed mode)

The specific problem I'm running into (though it's probably more prevalent): malloc/free calls AFTER catboost is loaded cause segfaults like this in the JVM:

Current thread (0x00007fbbdb009800):  JavaThread "main" [_thread_in_native
, id=9987, stack(0x0000700002b47000,0x0000700002cbe000)]

Stack: [0x0000700002b47000,0x0000700002cbe000],  sp=0x0000700002cbb268,  f
ree space=1488k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native
code)
C  [libcatboost4j-prediction10020838049295877552.dylib+0x202a61]  mi_free_generic+0x91
C  [libasyncProfiler.so+0x19155]  std::__1::basic_ostringstream<char, std::__1::char_traits<char>, std::__1::allocator<char>>::~basic_ostringstream()+0x45
C  [libasyncProfiler.so+0x18e82]  Java_one_profiler_AsyncProfiler_execute0+0x5a2

You'll notice a completely unrelated library calling into mimalloc under catboost. This happens consistently - any other native code library that calls free ends up with a similar stack trace ending in mimalloc. Would there be a way to disable mimalloc for the JNI (and maybe all shared library) builds?

malloc/free calls AFTER catboost is loaded cause segfaults like this in the JVM:

Do you mean that the issue is that some memory that had been allocated by one allocator (that had been used by JVM process) is then tried to be freed by mimalloc implementation that replaced free after loading of catboost-prediction library's dynamic library?

Also, can you provide macOS version and CPU architecture (x86_64 or arm64 (Apple Silicon)) ?

We'll look into it, meanwhile you can try to build catboost-prediction JVM applier without mimalloc allocator yourself, remove lines from CMakeLists that are relevant to you CPU architecture here or here and then use these instructions.

And also what JRE do you use?

Updated the first post with the version details (x86_64 & temurin 17).

I'm not sure if it's related to allocations before/after - my most consistent reproduction (the one above) loads two JNI/JVMTI libraries, but I load catboost first.

Let me try to compile the build myself without mimalloc.

I can confirm removing mimalloc fixes the reproduction I have. Here's a reproduction you can try at home (you'll need to install Clojure & the clj tool - see here):

$ clj -J-Djdk.attach.allowAttachSelf=true -Sdeps '{:deps {ai.catboost/catboost-prediction {:mvn/version "1.2.3"} com.clojure-goes-fast/clj-async-profiler {:mvn/version "1.0.5"}}}}'
Clojure 1.11.1
user=> ai.catboost.CatBoostModel ;; Load catboost JNI
ai.catboost.CatBoostModel
user=> (require '[clj-async-profiler.core :as prof])
nil
user=> (prof/profile (dotimes [i 10000] (reduce + (range i))))
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x000000013f9da245, pid=90163, tid=27915
...

I am facing the same issue.

Catboost Version : 1.2.2
Host: "MacBookPro17,1" arm64, 8 cores, 8G, Darwin 21.6.0, macOS 12.5.1 (21G83)
JVM :

java --version
openjdk 11.0.15 2022-04-19 LTS
OpenJDK Runtime Environment Zulu11.56+19-CA (build 11.0.15+10-LTS)
OpenJDK 64-Bit Server VM Zulu11.56+19-CA (build 11.0.15+10-LTS, mixed mode)

Stack trace :

Stack: [0x000000016f6fc000,0x000000016f8ff000],  sp=0x000000016f8fdf30,  free space=2055k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [libcatboost4j-prediction6819137113863618542.dylib+0x175610]  _mi_free_block_mt+0x78
C  [jnilib-18296746703685709260.tmp+0x109b2a8]  arrow::Schema::Impl::~Impl()+0x4c
C  [jnilib-18296746703685709260.tmp+0x106d56c]  arrow::Schema::~Schema()+0x2c
C  [jnilib-18296746703685709260.tmp+0xc5e68]  arrow::dataset::DatasetFactory::Inspect(arrow::dataset::InspectOptions)+0xe0
C  [jnilib-18296746703685709260.tmp+0x59b0]  Java_org_apache_arrow_dataset_jni_JniWrapper_inspectSchema+0x60
j  org.apache.arrow.dataset.jni.JniWrapper.inspectSchema(J)[B+0
j  org.apache.arrow.dataset.jni.NativeDatasetFactory.inspect()Lorg/apache/arrow/vector/types/pojo/Schema;+26
j  org.apache.arrow.dataset.jni.NativeDatasetFactory.finish()Lorg/apache/arrow/dataset/jni/NativeDataset;+2
j  org.apache.arrow.dataset.jni.NativeDatasetFactory.finish()Lorg/apache/arrow/dataset/source/Dataset;+1