mimalloc hijacks alloc/free outside of just catboost
mikekap opened this issue · 6 comments
Problem: The linked-in mimalloc replaces malloc/free throughout the entire program you're running. This is an issue when catboost is dynamically loaded - e.g. via JNI.
catboost version: 1.2.2
Operating System: OSX 14.3.1 x86_64
Java version:
openjdk version "17.0.10" 2024-01-16
OpenJDK Runtime Environment Temurin-17.0.10+7 (build 17.0.10+7)
OpenJDK 64-Bit Server VM Temurin-17.0.10+7 (build 17.0.10+7, mixed mode)
The specific problem I'm running into (though it's probably more prevalent): malloc
/free
calls AFTER catboost
is loaded cause segfaults like this in the JVM:
Current thread (0x00007fbbdb009800): JavaThread "main" [_thread_in_native
, id=9987, stack(0x0000700002b47000,0x0000700002cbe000)]
Stack: [0x0000700002b47000,0x0000700002cbe000], sp=0x0000700002cbb268, f
ree space=1488k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native
code)
C [libcatboost4j-prediction10020838049295877552.dylib+0x202a61] mi_free_generic+0x91
C [libasyncProfiler.so+0x19155] std::__1::basic_ostringstream<char, std::__1::char_traits<char>, std::__1::allocator<char>>::~basic_ostringstream()+0x45
C [libasyncProfiler.so+0x18e82] Java_one_profiler_AsyncProfiler_execute0+0x5a2
You'll notice a completely unrelated library calling into mimalloc under catboost. This happens consistently - any other native code library that calls free
ends up with a similar stack trace ending in mimalloc. Would there be a way to disable mimalloc for the JNI (and maybe all shared library) builds?
malloc/free calls AFTER catboost is loaded cause segfaults like this in the JVM:
Do you mean that the issue is that some memory that had been allocated by one allocator (that had been used by JVM process) is then tried to be freed by mimalloc
implementation that replaced free
after loading of catboost-prediction
library's dynamic library?
Also, can you provide macOS version and CPU architecture (x86_64 or arm64 (Apple Silicon)) ?
We'll look into it, meanwhile you can try to build catboost-prediction
JVM applier without mimalloc
allocator yourself, remove lines from CMakeLists that are relevant to you CPU architecture here or here and then use these instructions.
And also what JRE do you use?
Updated the first post with the version details (x86_64 & temurin 17).
I'm not sure if it's related to allocations before/after - my most consistent reproduction (the one above) loads two JNI/JVMTI libraries, but I load catboost first.
Let me try to compile the build myself without mimalloc.
I can confirm removing mimalloc fixes the reproduction I have. Here's a reproduction you can try at home (you'll need to install Clojure & the clj tool - see here):
$ clj -J-Djdk.attach.allowAttachSelf=true -Sdeps '{:deps {ai.catboost/catboost-prediction {:mvn/version "1.2.3"} com.clojure-goes-fast/clj-async-profiler {:mvn/version "1.0.5"}}}}'
Clojure 1.11.1
user=> ai.catboost.CatBoostModel ;; Load catboost JNI
ai.catboost.CatBoostModel
user=> (require '[clj-async-profiler.core :as prof])
nil
user=> (prof/profile (dotimes [i 10000] (reduce + (range i))))
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x000000013f9da245, pid=90163, tid=27915
...
I am facing the same issue.
Catboost Version : 1.2.2
Host: "MacBookPro17,1" arm64, 8 cores, 8G, Darwin 21.6.0, macOS 12.5.1 (21G83)
JVM :
java --version
openjdk 11.0.15 2022-04-19 LTS
OpenJDK Runtime Environment Zulu11.56+19-CA (build 11.0.15+10-LTS)
OpenJDK 64-Bit Server VM Zulu11.56+19-CA (build 11.0.15+10-LTS, mixed mode)
Stack trace :
Stack: [0x000000016f6fc000,0x000000016f8ff000], sp=0x000000016f8fdf30, free space=2055k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C [libcatboost4j-prediction6819137113863618542.dylib+0x175610] _mi_free_block_mt+0x78
C [jnilib-18296746703685709260.tmp+0x109b2a8] arrow::Schema::Impl::~Impl()+0x4c
C [jnilib-18296746703685709260.tmp+0x106d56c] arrow::Schema::~Schema()+0x2c
C [jnilib-18296746703685709260.tmp+0xc5e68] arrow::dataset::DatasetFactory::Inspect(arrow::dataset::InspectOptions)+0xe0
C [jnilib-18296746703685709260.tmp+0x59b0] Java_org_apache_arrow_dataset_jni_JniWrapper_inspectSchema+0x60
j org.apache.arrow.dataset.jni.JniWrapper.inspectSchema(J)[B+0
j org.apache.arrow.dataset.jni.NativeDatasetFactory.inspect()Lorg/apache/arrow/vector/types/pojo/Schema;+26
j org.apache.arrow.dataset.jni.NativeDatasetFactory.finish()Lorg/apache/arrow/dataset/jni/NativeDataset;+2
j org.apache.arrow.dataset.jni.NativeDatasetFactory.finish()Lorg/apache/arrow/dataset/source/Dataset;+1
This looks like a problem with two-level namespace in catboost.dylib
https://developer.apple.com/library/archive/documentation/DeveloperTools/Conceptual/DynamicLibraries/000-Introduction/Introduction.html