Make rustc use jemalloc through #[global_allocator]
SimonSapin opened this issue · 16 comments
Once the #[global_allocator]
attribute and the GlobalAlloc
trait are stable (#49668), we plan to make the system allocator the default for executables instead of jemalloc, and remove jemalloc from the standard library: #36963 / #27389.
Presumably, we want rustc to keep using jemalloc since it often performs better, for rustc’s typical workload. Like other programs, it can do so using #[global_allocator]
and the jemallocator crate or similar. (And make jemalloc symbols be unprefixed so that they get picked up by LLVM too, but I don’t forsee a difficulty adding a Cargo feature to jemallocator
for this.)
However, per #45966 (comment) this might not work because rustc itself is compiled with -C prefer-dynamic
and links to the standard library dynamically.
@alexcrichton, are -C prefer-dynamic
and #[global_allocator]
fundamentally incompatible or is this something we can fix? Is it possible to make rustc not use -C prefer-dynamic
?
CC @gnzlbg, @rust-lang/compiler
This is not how it is implemented in Rust AFAIK, but my model coming from C++ would be that:
- There is one global extern function per method in the
GlobalAlloc
trait - Everything uses the global allocator through these functions and is compiled against them (the
std
lib,rustc
, ...) - A crate containing a
#[global_allocator]
contains these functions (e.g. generated by the compiler) which just call the trait methods of the global allocator in that crate
If there is a #[global_allocator]
in the dependency graph, we can just assume that the user wanted to link an allocator statically.
If there is no #[global_allocator]
present in the dependency graph, if the user wants to link the allocator
- statically: we add the system allocator crate (or some other crate) to the dependency graph, such that the required extern functions are present. LTO should allow inlining through the extern functions in this case.
- dynamically: we do nothing. A dynamic library with those symbols present must be linked at run-time.
And that's it? If the functions are present in the binary, they can be optimized through LTO, and if they aren't, the binary must be dynamically linked to something that has them or it won't run.
This adds a couple of constraints on the GlobalAlloc
trait, like it cannot provide generic methods, because those cannot be dynamically linked, but I think that's alright.
This is not how it is implemented in Rust AFAIK
That's exactly how it's implemented in Rust: https://github.com/rust-lang/rust/blob/master/src/liballoc/alloc.rs#L25-L38
The technical probably here is, yes, libstd is built as a dynamic library for the compiler. This dynamic library, especially on Windows, needs to have all symbols resolved. This means that when we build libstd itself we must choose an allocator. Currently we aren't set up to do two separate builds of libstd, one for an rlib and one for a dylib.
So it's a hard technical requirement that the first dylib we compile must have all symbols resolved, and currently that means that it must select an allocator. We may be able to get away without compiling libstd as a dylib and just dealing with rustc, but that runs a high risk of being a breaking change.
A possible solution is basically just hacking around everything here in rustc... somehow. Like basically adding hardcoded logic that libstd's dylib links to jemalloc while the rlib explicitly doesn't link to jemalloc (or something like that)
Does this mean that #[global_allocator]
is fundamentally incompatible with dynamic linking?
Well on Windows if the global allocator is in a .dll
then you can just swap it out for a different .dll
with the same ABI and same name but a different global allocator.
Historically the system allocator and jemalloc had different sets of symbols associated with them and we choose "at the last minute" which to route the main allocation symbols to. Nowadays though with #[global_allocator]
I don't think this is necessary any more and #[global_allocator]
can probably generate the symbols that liballoc expects.
In that sense I think this will be possible by building libstd.dylib with a dynamic dependency on these symbols. Ideally we'd do something like include the symbols in libstd.dylib but force all function calls to go through the dynamic linker still. That way we could still load jemalloc but it wouldn't be required. I'm not sure how plausible that is though for all platforms.
It just occurred to me that rustc is already not using jemalloc on Windows, so only Unix-like platforms are relevant to this issue.
I tried the "obvious" patch (on top of #52020):
diff --git a/src/librustc_driver/lib.rs b/src/librustc_driver/lib.rs
index 84f7b35d21..c7e9fc77ce 100644
--- a/src/librustc_driver/lib.rs
+++ b/src/librustc_driver/lib.rs
@@ -33,6 +33,8 @@ extern crate arena;
extern crate getopts;
extern crate graphviz;
extern crate env_logger;
+#[cfg(not(windows))]
+extern crate jemallocator;
#[cfg(unix)]
extern crate libc;
extern crate rustc_rayon as rayon;
@@ -118,6 +120,11 @@ pub mod driver;
pub mod pretty;
mod derive_registrar;
+#[cfg(not(windows))]
+#[cfg(not(stage0))]
+#[global_allocator]
+static A: jemallocator::Jemalloc = jemallocator::Jemalloc;
+
pub mod target_features {
use syntax::ast;
use syntax::symbol::Symbol;
It seems to work perfectly on Linux. The executable and every .so
file all have their own __rust_alloc
symbol. Most of them call __rdl_alloc
(system, the default), the ones in the executable and in librustc_driver*.so
call __rg_alloc
(jemallocator, through #[global_attribute]
). I assume that the symbol in the executable "wins" somehow, since when running that rustc in gdb and breaking on __rust_alloc
I always end up in code that then calls __rg_alloc
.
It doesn’t work at all on macOS. The symbols present in various files look similar, but when running under lldb most calls to __rust_alloc
seem to go to a version that calls __rdl_alloc
. Only allocations made within the rustc_driver
crate seem to be correctly routed through the #[global_allocator]
attribute and __rg_alloc
.
Based on these observation I supposed that current versions of rustc might have the same problem because the symbol and linking setup is pretty much the same. Maybe rustc on macOS doesn’t actually use liballoc_jemalloc? But somehow that’s not the case, and __rust_alloc
does go to __rde_alloc
and jemalloc. I don’t understand why my branch is different.
jemalloc itself hooks up the system allocator on osx, so even if rustc ends up using malloc, it ends up using jemalloc.
@SimonSapin hm while it may work on Linux all dynamic libraries having __rust_alloc
symbols sounds somewhat scary to me in that it's a nasty bug in the making for down the road. It'd be disastrous, for example, for some dynamic libraries to accidentally use malloc where others use jemalloc on Linux.
I like @glandium's idea of just having jemalloc linked in on OSX to implement the malloc/free symbols with jemalloc. For our tier 1 platforms that just leaves us figuring out Linux as we're not using jemalloc on Windows.
For Linux I'm not really sure what the best option is here. I'd love to get to a point where we can simplify how the allocator symbols work out (avoid redirecting shims) and perhaps just explicitly leverage the dynamic linker shenanigans to get the job done. We just need to be careful here because it can in theory affect stable programs compiled against the libstd dynamic library, but I can't imagine there are many of those in existence...
while it may work on Linux all dynamic libraries having __rust_alloc symbols sounds somewhat scary to me in that it's a nasty bug in the making for down the road.
I agree, but isn’t this setup already the same today?
@SimonSapin ah indeed true! I think this is a bug though right now in that __rust_alloc
should be a private dll-local symbol instead of an exported symbol (as it is today). Additionally all __rust_alloc
definitions in all the dynamic libraries are the same, rather than having one different one that happens to trump the other ones.
@SimonSapin oh so here's an idea: One possibility would be to create something like a rustc_std
crate in the distribution. This crate would be a dynamic library but would link everything statically (including libstd). Today libstd is the "base dynamic library" but for the compiler this would be the base dynamic library. All rustc dylibs would link to libstd through this library.
That way libstd's dll would default to the system allocator while librustc's "libstd" would link to jemalloc. I think that'd do the trick? That way we don't have to worry about duplicate symbols and such.
How would proc-macro crates fit into this? They’re compiled as dynamic libraries loaded in the same process as rustc, right?
Indeed yeah, but they currently link to libsyntax which is what defines the allocator, and in the future we can just make sure that the proc-macro crate type uses the same allocator as rustc (and/or the same set of runtime libraries)
I'm gonna try to consolidate allocator and jemalloc/rustc related issues into #36963 as I think this is all basically enabled by one PR which would solve that issue. I'll be updating the OP there soon too