Tracking issue for the GlobalAlloc trait and related APIs
SimonSapin opened this issue · 148 comments
PR #49669 adds a GlobalAlloc
trait, separate from Alloc
. This issue track the stabilization of this trait and some related APIs, to provide the ability to change the global allocator, as well as allocating memory without abusing Vec::with_capacity
+ mem::forget
.
Defined in or reexported from the std::alloc
module:
Update: below is the API proposed when this issue was first opened. The one being stabilized is at #49668 (comment).
/// #[global_allocator] can only be applied to a `static` item that implements this trait
pub unsafe trait GlobalAlloc {
unsafe fn alloc(&self, layout: Layout) -> *mut Opaque;
unsafe fn dealloc(&self, ptr: *mut Opaque, layout: Layout);
unsafe fn alloc_zeroed(&self, layout: Layout) -> *mut Opaque {
// Default impl: self.alloc() and ptr::write_bytes()
}
unsafe fn realloc(&self, ptr: *mut Opaque, layout: Layout, new_size: usize) -> *mut Opaque {
// Default impl: self.alloc() and ptr::copy_nonoverlapping() and self.dealloc()
}
// More methods with default impls may be added in the future
}
extern {
pub type Opaque;
}
#[derive(Debug, Copy, Clone, PartialEq, Eq)]
pub struct Layout { /* private */ }
impl Layout {
pub fn from_size_align(size: usize: align: usize) -> Result<Self, LayoutError> {…}
pub unsafe fn from_size_align_unchecked(size: usize: align: usize) -> Self {…}
pub fn size(&self) -> usize {…}
pub fn align(&self) -> usize {…}
pub fn new<T>() -> Self {…}
pub fn for_value<T: ?Sized>(t: &T) -> Self {…}
}
#[derive(Copy, Clone, Debug)]
pub struct LayoutError { /* private */ }
/// Forwards method calls to the `static` item registered
/// with `#[global_allocator]` if any, or the operating system’s default.
pub struct Global;
/// The operating system’s default allocator.
pub struct System;
impl GlobalAlloc for Global {…}
impl GlobalAlloc for System {…}
CC @rust-lang/libs, @glandium
Unresolved questions or otherwise blocking
- Wait for extern types #43467 to be stable in the language before stabilazing one in the standard library.
- Name of the
Global
type.GlobalAllocator
? -
Name of theRenamed toVoid
type. Lower-casevoid
would match C (our*mut void
is very close to C’svoid*
type), but violates the convension of camel-case for non-primitive types. ButVoid
in camel-case is already the name of an empty enum (not an extern type like here) in a popular crate. (An extern type is desirable so that<*mut _>::offset
cannot be called without first casting to another pointer type.) Maybe call itOpaque
?Unknown
?Opaque
.- Rename again to something else?
-
TakingLayout
by reference, or making itCopy
#48458.Copy
: #50109 -
Not to be required by this trait since glibc and Windows do not support this.GlobalAlloc::owns
: #44302 proposes making it required for allocators to be able to tell if it “owns” a given pointer. -
Settle semantics of layout "fit" and other safety conditions.Without anusable_size
method (for now), the layout passed to dealloc must be the same as was passed to alloc. (Same as withAlloc::usable_size
’s default impl returning(layout.size(), layout.size())
.) -
An#50144oom
method is part ofGlobalAlloc
so that implementation-specific printing to stderr does not need to be part of liballoc and instead goes through#[global_allocator]
, but this does not really belong in theGlobalAlloc
trait. Should another mechanism like#[global_allocator]
be added to have pluggable OOM handling?
Not proposed (yet) for stabilization
- The
Alloc
trait - The rest of
Layout
methods - The
AllocErr
type
We’re not ready to settle on a design for collections generic over the allocation type. Discussion of this use case should continue on the tracking issue for the Alloc
trait: #32838.
I was initially confused by the name Void
. I confused it with the aforementioned empty enum. Bikeshed: I wonder if it could be called Allocation
or Memory
?
Relevant discussions pre-PR:
- about
Void
:
0497b5e#r28405553 - about
Global
:
c32a3de#commitcomment-28405600
But I don’t know if any platform’s system allocator (like libc’s malloc) supports this, let alone all of them.
The OSX system allocator supports that. Jemalloc does too, optionally. Glibc malloc and Windows heap allocator don't.
Taking Layout by reference, or making it Copy #48458
The fact that realloc
now takes the new size as usize
rather than a Layout
strongly points towards just making Layout
implement Copy
and treating it as a simple (length, alignment) pair.
Has the Heap
type been renamed to Global
?
Naming bikesheds:
Given that the attribute is global_allocator
, should the trait be called GlobalAllocator
instead? alloc
vs. allocator
is inconsistent.
Also, I am not very happy with Void
. void
is probably the most misunderstood type in C. It is often described as "empty", but it actually is a unit type ()
, not the empty never type !
. void*
is NOT just a pointer to ()
though, but an independent concept. I'd rather we do not step into the middle of this confusion.
From the alternatives proposed above, I prefer Unknown
. Some more random suggestions: Blob
, Extern
, Data
. I think from all of these, my personal favorite is Blob
, which I have often seen used as a term for "not further specified binary data". But really, any of these is strictly better than Void
IMHO.
If GlobalAlloc
is renamed to GlobalAllocator
, should Alloc
be Allocator
? And the alloc
method allocate
? (And similarly other methods.)
Agreed that Void
or void
is probably best avoided, though I’m not sure what to pick instead.
I would personally keep the name "alloc" for traits and methods, the global_allocator
is talking about a specific instance whereas the methods/traits are actions/groups of types.
std::alloc::Global
could also be something like GlobalHeap
(I think suggested by @glandium?) or DefaultHeap
or DefaultAlloc
. I don't think we can choose the name GlobalAlloc
without renaming the trait GlobalAlloc
itself.
I don't think we should implement owns
. The sheer fact that two tier-1 platforms don't implement it I feel like is a nail in that coffin.
The GlobalAlloc
trait should require Sync
as a super trait, no?
Yeah, Send
as well.
I don't think Send
is actually necessary. A GlobalAlloc
is never dropped and does not require a &mut
.
My rule of thumb for Send
is: "is it OK to drop this type in a different thread than the one it was created in?"
As Alex said, Sync
is already required by static
items which are the only ones where #[global_allocator]
can be used.
sticking it into a static will require both traits for sure.
That should only require Sync
, as @SimonSapin said, but not Send
, right? Only &T
is shared across threads.
from_size_align_unchecked
is not an unsafe method? Are there any invariants that Layout
upholds?
Why Void
and not fn alloc<T>() -> *mut T
?
from_size_align_unchecked
is unsafe, that was a mistake. See https://doc.rust-lang.org/nightly/core/heap/struct.Layout.html#method.from_size_align for invariants.
alloc_one<T>()
is a convenience method on top of alloc(Layout)
, you can’t always statically construct a type that has the desired layout.
Every time I see this trait name I get it confused with GlobalAlloc
the Windows API function: https://msdn.microsoft.com/en-us/library/windows/desktop/aa366574
@retep998 What do you think would be a better name for the trait?
[Reposting in the right tracking issue this time...]
So, the name currently decided on by @SimonSapin for the allocated memory type seems to be Opaque. I concur that Void is a dubious name, but I don't think Opaque is great, mainly due to its complete lack of descriptiveness. I propose one of the following:
Blob
Mem
MemBlob
MemChunk
OpaqueMem
– not my favourite, but at least slightly more explicit than just Opaque
I'd probably lean towards Mem, since it's the most pithy, but the others are okay too.
I would add Raw
to the list, since allocation returns raw untyped memory.
RawMem
would work too, yeah. Just Raw
is too vague though.
What word does OED define as the following perfect description of our use case?
Used to emphasize a lack of restriction in referring to any thing or amount, no matter what.
This is a serious suggestion by the way.
extern {
fn memcpy(dest: *mut Whatever, src: *const Whatever, n: size_t);
}
unsafe impl GlobalAlloc for BetterAllocator {
unsafe fn alloc(&self, layout: Layout) -> *mut Whatever {
/* ... */ as *mut Whatever
}
}
It’s not a regular noun though. (An interrogative/relative pronoun yes, but those are a bit different.)
What is the caller going to store in this memory that I allocated for them?
Well, whatever they want.
What type of data am I copying with this memcpy?
I don't know, ¯\_(ツ)_/¯ whatever happens to be there I guess.
IMO this captures the essence of void *
.
@rfcbot fcp merge
The global allocator APIs and associated infrastructure have been around for a very long time at this point and I think that it's high-time to propose stabilization. With @SimonSapin's recent PR to refactor out a GlobalAlloc
trait I'd like to FCP to stabilize just that trait and associated items (plus a conservative API on Layout
)
Team member @alexcrichton has proposed to merge this. The next step is review by the rest of the tagged teams:
No concerns currently listed.
Once a majority of reviewers approve (and none object), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!
See this document for info about what commands tagged team members can give me.
As for the name of Opaque
, I'm hoping we can decide during stabilization. I'm personally currently in favor of *mut Whatever
There could be some more bikeshedding on the Global
name too. (c32a3de#commitcomment-28405600)
I’d rather stick with Opaque
than go with Whatever
. Both are too vague, but especially Whatever
, which is really nebulous.
As motivated in #49668 (comment), *mut Whatever
conveys a pointer that points to whatever data happens to exist there, of whatever type. One can allocate memory intended to hold whatever the caller wants there, one may copy whatever data from one pointer to another, etc.
While I’m all for keeping momentum, there’s still a number of open issues to resolve before stabilization. (The semantics of layout fit is a subtle one that I don’t quite understand.)
One open question that came up during the all-hands is Layout::for_value
. This does not need to actually dereference the pointer, so maybe it should be changed to a raw pointer. The trouble with making it a reference is that references (maybe?) assert that the value they point to is valid, while for_value
is called with a reference to a dead value e.g. by Arc::drop
.
@RalfJung It calls std::mem::size_of_val
which takes &T
@SimonSapin Good point. Probably we should have raw pointer versions of that (and align_of_val
).
@dtolnay Okay, I see your point. The problem for me is still that Whatever
doesn’t really work as a noun.
I would like to repeat @alexreg's suggestion of RawMem
. Either that or keep Opaque
, I think that Whatever
feels too informal.
@SimonSapin indeed yeah there are some unresolved questions, but I'd consider them largely bikesheds by this point. (I also see our bikeshed for this as more of an airplane hangar it's got so many coats of paint)
extern type
- I am personally mostly a fan of*mut u8
, but I realize that this isn't popular. Ifextern type Foo
doesn't work out though I think we'll be forced to switch to*mut u8
to get.offset
working in terms of bytes. I wouldn't see this as the end of the world.Global
naming - a bikeshed!- "fit" semantics - if we're not stabilizing
usable_size
, I believe this is settled? That is, you must deallocate and reallocate with precisely the same size and alignment. (alignment has our hand forced by Windows I believe) - OOM methods - we discussed this a bit on IRC but I think that we'll want to stabilize
oom
here.
Ok, sounds good. (FWIW I’m also fine with *mut u8
.)
@alexcrichton why are we stabilizing GlobalAlloc::oom
? I still do not understand why the allocator is the thing that decides how OOM is treated.
OOM in general to me feels pretty closely coupled with the global allocator (in that you often want to customize both). On a technical level it's pretty awkward for us to move OOM handling off the GlobalAlloc
trait and if we were to not stabilize it then customizing OOM wouldn't be possible and you also wouldn't even get a nice message saying that OOM happened.
They're still on separable axes in that only the final crate decides the global allocator, and it's up to that crate to pick-and-choose implementations if it likes.
It's true you often want to customize both but they're totally orthogonal customizations. If they're conflated, what does that actually look like? Are people expected to make their own GlobalAlloc
implementation that wraps another and delegates all methods except for oom
? How would System::oom
and Jemalloc::oom
differ?
Yes if you want to customize OOM and not the allocator you'd have to make your own implementation of the trait and dispatch all methods except one. That, however, doesn't seem like that much overhead to me (it's not really that common to override OOM?). I would imagine Jemalloc::oom
is the same as System::oom
, but we have to make sure that it's possible to define it as that.
So it seems like we're trying to hitch OOM customization to gobal allocator customization because we happen to already have the glue in place for it. What is the specific driver for stabilizing it in the first place? Is there concrete demand for customizing OOM? The "delegate n-1 methods" workflow seems really bad.
We somehow need to stablize when you're using jemalloc that OOM prints out a nice message like the system allocator would by default. Currently we're not really in a position to remove OOM handling from the trait to decouple oom handling from the GlobalAlloc
trait, which in my mind forces our hand on stabilizing GlobalAlloc::oom
.
If we can figure out how to move the method off the trait and a good way and have custom allocators by default route to "print an OOM message" then we should be able to avoid stabilizing it for sure.
That being said, I don't really see the erognomic downside here as really that bad, it's something you do vanishing rarely and we're about enabling workflows rather than making the workflows perfect right now.
I feel like I must be missing something because this doesn't seem like a hard problem to solve:
Option 1: Replace the code in liballoc that calls the_global_allocator.oom()
with code that calls rtabort!("out of memory")
or whatever System::oom
does.
Option 2: Keep GlobalAlloc::oom
, give it a default impl (which every implementation will want anyway, right?), and leave it unstable.
@sfackler Keep in mind that liballoc can't depend on any crates other than core, so the only implementation of oom
that liballoc could provide would be to call intrinsics::abort
. This isn't very user-friendly since it just crashes the program with an illegal instruction.
There are only really 2 options to provide a user-friendly OOM message:
- A trait method in
GlobalAlloc
, which allows allocator implementations to call libc to print a message and abort. - A lang item, which allows libstd to provide an OOM handler. The downside of this is that it adds yet another lang item that we will need to stabilize later on to allow
no_std
programs to use liballoc (since liballoc relies on this lang item existing).
As far as the workflow being bad, if I want to pick what happens on OOM, I want to write something like
#[oom_handler]
fn oom() -> ! {
// log stuff!
}
not
use alloc::{Layout, GlobalAlloc, Opaque};
use system_alloc::SystemAllocator;
#[global_allocator]
static _ALLOC: OomHandlingAllocator<SystemAllocator> = OomHandlingAllocator(SystemAllocator);
struct OomHandlingAllocator<A>(A);
impl<A> GlobalAlloc for OomHandlingAllocator<A>
where
A: GlobalAlloc
{
unsafe fn alloc(&self, layout: Layout) -> *mut Opaque {
self.0.alloc(layout)
}
unsafe fn dealloc(&self, ptr: *mut Opaque, layout: Layout) {
self.0.dealloc(ptr, layout)
}
unsafe fn alloc_zeroed(&self, layout: Layout) -> *mut Opaque {
self.0.alloc_zeroed(layout);
}
unsafe fn realloc(&self, ptr: *mut Opaque, layout: Layout, new_size: usize) -> *mut Opaque {
self.0.realloc(ptr, layout, new_size)
}
// hopefully I didn't forget to delegate anything...
fn oom(&self) -> ! {
// log stuff!
}
}
We should put the work into making this reasonable.
OOM and the global allocator seem to be two examples of "compiler-endorsed singletons" ("stable lang-items"?) or whatever one may want to call them; I think I've seen this brought up elsewhere where people wanted to use a similar mechanism for something else. Maybe it makes more sense to make them both instances of the same mechanism, rather than bolting them together in one trait?
@sfackler as @Amanieu mentioned unfortunately we can't tweak the definiton as-is today. The GlobalAlloc
trait is defined in core
which has no access to primitives like write
, so it can't define an implementation that either delegates to or implements the "reasonable expectation" by default. (but libstd can do this).
We could perhaps move GlobalAlloc
but we could only move it up as far as alloc
which as the same issues, it doesn't have access to system resources as it purely assumes global allocation.
We could, however, use an entirely new mechanism for defining OOM handlers like panic handlers and such, but I really do think that's overkill. What you mentioned is indeed not super ergonomic, but I don't think it's something we must fix. Your example could alternatively be written as:
extern crate some_crates_io_crate;
use std::alloc::System;
use some_crates_io_crate::{CustomOom, Oom};
#[global_allocator]
static ALLOC: CustomOom<System, MyOom> = CustomOom(System, MyOom);
struct MyOom;
impl Oom for MyOom {
fn oom() -> ! { ... }
}
I don't understand what you mean by overkill.
The amount of compiler support necesary for custom panic implementations, custom global allocators, custom panic runtimes, etc, is quite large and difficult to maintain. Adding "yet another one" is overkill in my mind (as I've implemented all of these in the compiler so far) compared to just having a crate on crates.io that allows customizing just OOM
oom seems very similar to panic_fmt - why would it not be able to use the same logic as that but be pulled in through liballoc rather than libcore?
I think OOM is actually very similar to panic in the kinds of ways you'd want to control it. Setting something like this statically at build time is the least flexible way of doing things, and I think we'd want to end up in a place in std where we have something like std::panic::set_hook
but for OOM. This allows you more control to easily change your OOM behavior based on runtime config.
@sfackler yeah we can concoct anything we want to design an OOM handler into the system, my point is that I feel like it's all overkill. We've got a workable solution today and it's not clear to me why we'd want to strong optimize differently for various use cases.
It's always possible, yes, for alloc
to have another "weak lang item" defined in libstd, and then libstd has infrastructure for saying "here's the real oom handler". That comes at the downside of adding another assumption to alloc
(an OOM handler and a global allocator).
How would System::oom and Jemalloc::oom differ?
Today, jemallocator::Jemalloc::oom
is defined as a call to std::alloc::System::oom
.
Is there concrete demand for customizing OOM?
We have an accepted RFC #48043 for some way to opt into OOM causing a panic rather than an abort. The exact mechanism is not defined by the RFC and left up to the implementation, but this is not implemented yet. fn oom() -> !
(whether it’s a method of #[global_allocator]
or a free function with its own similar attribute) could be that mechanism.
🔔 This is now entering its final comment period, as per the review above. 🔔
We've got a workable solution today and it's not clear to me why we'd want to strong optimize differently for various use cases.
My point is that the current solution is bad API design. The fact that jemallocator
depends on alloc_system
just so that it doesn't need to copy all of this stuff is bad: https://github.com/rust-lang/rust/blob/master/src/liballoc_system/lib.rs#L370-L428. Dlmalloc just doesn't implement oom, not really sure what the deal is there: https://github.com/alexcrichton/dlmalloc-rs/blob/c99638dc2ecfc750cc1656f6edb2bd062c1e0981/src/global.rs#L123-L125.
The GlobalAlloc
trait should be responsible for allocating memory, not allocating memory and figuring out how to print an error message on OOM. When someone in the future asks "why is OOM behavior configured through the global allocator?", what is the answer other than "we didn't have the energy to make a lang item and it's a convenient place to stick it"?
I am willing to do the work of making a #[lang="oom"]
. It's another lang item you need to define if you're working in alloc-but-no_std-land, but I think that is a reasonable price to pay.
@sfackler thanks for doing the legwork to turn it into a lang item!
Now that #50144 has moved OOM to its own lang item, this may not be the right place to talk about this, so please redirect me to the right place if not.
In Firefox, we have the issue that we need to differentiate OOM crashes from other crashes, and we also need to collect the requested size that led to OOM when it happens. The problem is that the GlobalAlloc changes removed the oom function argument that could have allowed that, and the new oom lang item has kept the new signature.
The alternatives are not great:
- Moving OOM handling to the allocator function would mean turning it to an infallible allocator function, meaning code paths that actually want to handle allocation failures gracefully can't anymore.
- Making the allocator record the size of allocations that fail, so that the OOM handler can check it out afterwards is not reliable, because a caller may handle the failure gracefully, and you'd have to reset the oom record for a subsequent crash not to look like an oom. It would be possible, though, to store the oom record in a thread local variable, and then use that thread local variable from the oom handler to feed the crash reporter. That seems convoluted. It could work in Firefox, because we control the allocator as well (although that's not true for ASAN builds), but that's not a general purpose solution.
How does the C side of the codebase deal with this?
The C side has different function calls for fallible and infallible allocations. Fallible allocations return NULL when they fail, and infallible allocations panic and handle recording the OOM requested size.
How would the old oom function have supported what you want, though? It's still a separate function that's called after the function that failed, so you'd still need thread local tracking of what went wrong.
Since you control the allocator, you could have the allocation functions directly abort like the C versions do?
How would the old oom function have supported what you want, though? It's still a separate function that's called after the function that failed, so you'd still need thread local tracking of what went wrong.
The old oom function was taking an AllocErr, which contained the requested Layout.
Since you control the allocator, you could have the allocation functions directly abort like the C versions do?
As I said, that would mean making all rust allocations infallible, even for code that does want to handle allocation failure.
Ah sure - the AllocErr parameter was removed a bit before it turned into a separate function and I had forgotten about that.
It seems reasonable to me for oom take a Layout as a parameter.
Shall I just go ahead with a PR doing that?
Sure!
Should it be Layout
or just usize
for the allocation size?
Since checking for null and calling oom(…)
might be a very common pattern, it might make sense to provide convenience API(s) for it. But to avoid users having to pass the layout (or size) twice, it’d have to be a whole set of parallel APIs for alloc
, alloc_zeroed
, and realloc
. With Alloc::*_excess
, it’s now two axes where we potentially want all combinations.
Considering alloc might fail because of wrong alignment requirements, an oom handler may want to check that too.
(Thank you for reminding me that I wanted to add alloc_zeroed_excess
)
The final comment period is now complete.
Another problem with the oom lang item: it doesn't allow to override the one from std:
error[E0152]: duplicate lang item found: `oom`.
= note: first defined in crate `std`.
Presumably, this would be fixed by making it an actual weak lang item.
Continuation of my earlier comment about passing Layout to oom being painful, it's particularly painful in
Line 555 in 9e8f683
where the patch looks like:
@@ -552,7 +553,7 @@ impl<T, A: Alloc> RawVec<T, A> {
pub fn reserve(&mut self, used_cap: usize, needed_extra_cap: usize) {
match self.try_reserve(used_cap, needed_extra_cap) {
Err(CapacityOverflow) => capacity_overflow(),
- Err(AllocErr) => oom(),
+ Err(AllocErr) => oom(Layout::array::<T>(self.amortized_new_size(used_cap, needed_extra_cap).unwrap()).unwrap()),
Ok(()) => { /* yay */ }
}
}
It ends up similarly painful in
rust/src/libstd/collections/hash/table.rs
Line 773 in 9e8f683
rust/src/libstd/collections/hash/table.rs
Line 812 in 9e8f683
rust/src/libstd/collections/hash/map.rs
Line 787 in 9e8f683
where the patch similarly needs to call a function.
Actually, the ones for collections are just plain horrible, especially the one for HashMap, which I haven't implemented because of the error-proneness. At this point, I'm tempted to say infallibility should be handled by the allocator itself. Adding methods is not incredibly attractive, so maybe a parameter to the alloc functions?
Another problem with the oom lang item: it doesn't allow to override the one from std:
The thing to do here is to make an API like std::panic::set_hook
. It is already a weak lang item.
set_hook
or a lang item – either sounds like a good way to handle OOM for the GlobalAlloc
. Is there any reason for preferring one over the other?
- Allowing people to configure something at runtime is strictly more flexible than only allowing them to configure it at compile time.
- Lang items can only be defined once, and std defines the oom lang item.
Oh, I didn't know about restriction 2. As for 1, it can be circumvented easily as you have a lang item. :-)
I noticed that the compiler will accept both a normal and an extern
function for the oom
lang item:
// This is how rust_oom is defined in std
#[lang = "oom"]
extern fn rust_oom() -> ! { loop {} }
// The compiler also accepts this
#[lang = "oom"]
fn rust_oom() -> ! { loop {} }
Is this intentional? Does rustc automatically pick the correct ABI for the lang item?
OTOH, global_allocator is set at compile time, and you get one by default if you don't set one yourself. Why should oom handling be different?
I don't understand. Why would we want to be less flexible here?
Why would you not want global_allocator to be similarly more flexible, then?
Anyways, this discussion also doesn't address the problem I raised in my last comments, where trying to pass a layout to the oom function is difficult in some places. As mentioned by @SimonSapin this can be addressed by adding helper functions to the Alloc
trait, but the multiplication of functions is not helping.
I'm now thinking of something like:
struct Infallible<A: Alloc>(A);
impl<A: Alloc> Alloc for Infallible<A> { ... }
I was thinking that the infallible APIs could maybe return NonNull
, but that only works if they’re separate APIs.
Why would you not want global_allocator to be similarly more flexible, then?
Because we can't. panic::set_hook
requires allocating, and the runtime allocates before main.
Because we can't. panic::set_hook requires allocating, and the runtime allocates before main.
That's a fair reason. OTOH, I'm not convinced the flexibility for oom would be any useful. And "because lang items can be defined only once" doesn't seem like a compelling reason, when, as mentioned earlier, the problem doesn't exist for #[global_allocator]
. IOW, I don't see why oom can't be handled more similarly to #[global_allocator]
.
(or #[panic_implementation]
, for that matter)
I was thinking that the infallible APIs could maybe return NonNull, but that only works if they’re separate APIs.
Callers are not calling directly the GlobalAlloc
trait methods, though. And the Alloc
methods already return NonNull
, albeit in a Result
with an AllocErr
. It would sure be better if an infallible API returned !
instead of AllocErr
. In the Infallible<A>
scheme I suggested above, that could be done by adding an Error
associated type to the Alloc
trait.
Something like panic_hook
requires dynamic dispatch, which is not a cost we wanted to pay for allocators. Panics are the extra-slow path, like OOM, so don't need to be optimized at all.
So I gave Infallible<A>
a shot, and while it does help with many places calling oom
, it doesn't help with all the places involving CollectionAllocErr
. This would be equally true with infallible methods rather than Infallible<A>
. It seems, though, it could be reasonable to add the Layout
to CollectionAllocErr::AllocErr
, although that would kill From<AllocErr> for CollectionAllocErr
.
Ok, I have a working PoC for Infallible<A>
and oom(Layout)
, and I haven't touched anything related to GlobalAlloc
, so I'll move the discussion to #48043 and #32838 and will file a separate issue for OOM itself.
Refocusing on GlobalAlloc
itself, there doesn't seem to have been a conclusion on the bikeshedding.
cc @SimonSapin
I just went through #49669 which does not offer much rationale about some of the changes but points to this issue as the correct place for discussion, and also mentions the All-Hands but I wasn't there (maybe someone that was there can answer my questions).
It looks like #49669 takes many steps in the opposite direction that I at least thought that the allocator design was going, but I don't see these issues being mentioned here, and they are IMO fundamental.
I thought that the plan was to:
- have one crate per allocator
- have each allocator implement the traits themselves (GlobalAlloc, Alloc, etc.) so that they can provide specializations when available
- eventually replace the allocator crates in rust-lang/rust like liballoc_jemalloc with the rust-lang-nursery crates like jemallocator, so that users that choose to include a separate allocator crate get the same allocator that they would have gotten when this was used through rust. For example, a crate compiled with system malloc as global allocator that wants to use Rust jemalloc for a particular thing could just include jemallocator and get the rust version of jemalloc
Before #49669 we were already at the step where we had the different allocator crates implementing their own traits, so I thought the next steps towards the allocator system structure would be to just replace them with submodules pointing to the crates in the nursery.
However, #49669 changes all these crates to not implement any allocator traits anymore, instead implementing GlobalAlloc
for the allocators in liballoc/alloc.rs
and implementing Alloc
on top of the more restricted GlobalAlloc
API.
This prevents allocators from providing more efficient implementations of the trait methods. For example, liballoc_jemalloc
used to provide more efficient implementations of Alloc:shrink_in_place
and Alloc::grow_in_place
using xallocx
, but this is no longer the case. Also, Alloc::usable_size
used to return the correct available size for jemalloc but now it returns the size that was requested, which I'd consider a bug.
- So why were these changes made?
- Is it still a plan to replace the rust-lang/rust allocators with the ones in the nursery? If not, why not?
- Is it still a plan to allow allocators, including the global one, to provide more specialized implementations of the GlobalAlloc and the Alloc traits? If not why not? And if yes, why did that PR introduce this regression?
have one crate per allocator
Once upon a time, customizing the global allocator was done through an "allocator crate" with some dedicated crate-level attributes (and I think some free functions expected to be at the root?) Maybe you’re thinking of this?
Now we have the #[global_allocator]
attribute that goes on a static
item whose type implements GlobalAlloc
, but that type can be defined anywhere. Crates are not special there, the same crate could have multiple types that implement that trait.
have each allocator implement the traits themselves (GlobalAlloc, Alloc, etc.)
I don’t think there ever was a requirement that any type implements both traits. jemallocator
does for convenience of users.
get the rust version of jemalloc
When System
becomes the default, I expect the alloc_jemalloc
crate to be removed and I’d say there will no longer be a "rust version of jemalloc". Unless you consider the nursery to be part of "the rust project" but I view it more as crates that happen to be maintained by some of the same people as std.
I thought the next steps towards the allocator system structure would be to just replace them with submodules pointing to the crates in the nursery
As far as I know there never was a plan for jemallocator
to be a sumbodule of this repository and be distributed together with std
. Rather, users can opt to add it to their [dependencies]
in Cargo.toml
, and define themselves a static
with #[global_allocator]
.
Alloc:shrink_in_place
,Alloc::grow_in_place
,Alloc::usable_size
These methods are not part of the initial stabilization:
The high level idea that lead to #49669 is that we want to stabilize #[global_allocator]
and the trait that supports it, but we’re not ready to stabilize the Alloc
trait. From in-person discussion (sorry if I didn’t explain the reasoning enough here) we settled on having a separate trait with only "libc-equivalent functionality", minimizing the API surface in order to minimize the number of blockers to initial stabilization.
allow allocators, including the global one, to provide more specialized implementations of the GlobalAlloc and the Alloc traits?
jemallocator
already does.
@SimonSapin thanks for taking the time to go over all that on IRC.
To summarize my thoughts. I think that after understanding the next steps of unifying __rde_
/__rdl_
, removing liballoc_jemalloc
, ... the general direction looks to me to be the right one.
The only thing I think might be better to do differently is the implementation details of the liballoc_system
crate. This is a nitpick, but on some platforms like FreeBSD, NetBSD, and newer Android versions, jemalloc is the system allocator, and I think it would be cool if in these systems liballoc_system
would have feature parity with the jemallocator
crate. For example, I'd expect that in these systems the specializations of the Alloc
trait for jemalloc, that are available in the jemallocator
crate, are also implemented in the impl of Alloc
for the System
allocator and for the Global
allocator when Global == System
.
Yeah, seems reasonable for alloc_system to become more tuned for individual platforms over time.
To summarize a bit the discussion in internals, these are the changes that I would do to the global allocator trait. It is a fallible API that always returns the "Excess" and would look like this (where Excess
is now called Allocation
):
/// Represents a memory allocation, that is, a combination of a starting address
/// and the allocation layout.
pub struct Allocation(*mut Opaque, Layout);
impl Allocation {
/// Instantiates a memory allocation from a pointer and a layout.
pub fn new(ptr: *mut Opaque, layout: Layout) -> Self;
/// Starting address of the memory allocation.
pub fn ptr(&self) -> *mut Opaque;
/// Layout of the allocation.
pub fn layout(&self) -> Layout;
}
/// The `AllocErr` error specifies that an allocation failed.
///
/// This can only happen due to resource exhaustion or because the allocator
/// does not support the provided `Layout`.
#[derive(Clone, PartialEq, Eq, Debug)]
pub struct AllocErr(Layout);
impl AllocErr {
/// Layout that produced the allocation to fail.
pub fn layout() -> Layout;
}
/// Reallocation error.
///
/// This can happen either due to resource exhaustion, because the allocator
/// does not support the provided `Layout`, if the `Allocation` was allocated
/// with a differen allocator, etc.
///
/// It contains the `AllocErr` containing the provided `Layout` and the
/// `Allocation` that failed to reallocate.
pub struct ReallocErr(AllocErr, Allocation);
impl Realloc {
/// Layout of the reallocation request.
fn layout_request(&self) -> Layout;
/// Allocation that failed to reallocate.
fn allocation(self) -> Allocation;
}
/// A memory allocator that can be registered to be the one backing `std::alloc::Global`
/// though the `#[global_allocator]` attributes.
pub unsafe trait GlobalAlloc {
/// Allocates memory as described by the given `layout`.
fn alloc(&self, layout: Layout) -> Result<Allocation, AllocErr>;
/// Deallocate the memory `allocation`.
///
/// # Safety
///
/// The `allocation` pointer must come from this allocator and its size
/// and alignment must be either the requested or returned.
unsafe fn dealloc(&self, allocation: Allocation);
/// Allocates zeroed memory as described by the given `layout`.
fn alloc_zeroed(&self, layout: Layout) -> Result<Allocation, AllocErr>;
/// Shink or grow a memory `allocation` to the given `new_layout`.
/// # Safety
///
/// The `allocation` pointer must come from this allocator and its size
/// and alignment must be either the requested or returned.
unsafe fn realloc(&self, allocation: Allocation,
new_layout: Layout) -> Result<Allocation, ReallocErr>;
}
A couple of notes on the changes:
Excess
is renamed toAllocation
and contains a pointer to the allocation and a layout. The pointer can be null, and the layout can have size zero. As it was the case, if that is a valid the allocation, calling deallocate with that allocation should succeed.alloc
andalloc_zeroed
are notunsafe fn
s. They either succeed or return an error. If an allocator implementation returnsOk
then that's a valid allocation that can be passed todealloca
andrealloc
.- functions taking an
Allocation
are still unsafe because thatAllocation
could come from a different allocator, and thus cause undefined behavior. It is a precondition on these methods that theAllocation
s provided come from calling one of the allocation methods of these allocators. - Because these functions consume the allocation, and
Allocation
is neitherCopy
norClone
, double frees cannot happen without the user using unsafe code to duplicate anAllocation
.realloc
consumes the allocation and, if there was an error, returns it. - The wording of
# Safety
needs to be tuned such that collections likeVec
andBox
can recompute theAllocation
from the capacity and alignment of the type, where the capacity might be the size requested, or the size returned, and the same applies to the alignment. That is, recomputing anAllocation
from the pointer, the size returned, and the alignment requested, should work out.
I can try to send a working implementation of this as a PR over the weekend, that way we can see how much does this change the design, and do a perf run to see the impact on performance. I expect that suitable #[inline]
reduce the impact to close to zero when jemalloc is not used, since the only thing we would do is access the pointer and drop the Layout
.
Since AllocErr
now contains the layout, it should be possible to just pass it to oom
without contortions @glandium . An alternative is to provide an oom()
method to AllocErr
and ReallocErr
that do this for you. The only thing we would need to allow this AFAICT is a Display
impl for Layout
.
From the top of my head, at quick glance:
Using a pointer instead of NonNull
makes Result
s larger than they should be. Also, how can a null pointer be an Ok
result when there's AllocErr
? Are callers supposed to check both now? There's still the question whether a zero-size Layout
should be valid or not, both for codegen reasons (see last paragraph of the first message in https://internals.rust-lang.org/t/pre-rfc-changing-the-alloc-trait/7487) and consistency reasons (many things assume ZSTs have an address in the null page, allowing the allocator to whatever it pleases with that sounds dangerous).
Returning the layout as part of the AllocErr
is actually making things awful from a codegen perspective because to use that value to call oom
(the only reason it's there would be to feed it there) means things need to be moved between registers. See #32838 (comment) (beware, it's in the "hidden items"). Using the Layout
the caller already has is actually better in that regard, and I have a PR underway that deals with the oom
thing without too much contortion already, I just needed to adjust some methods to make things easier.
One thing that is nice-ish about Allocation
is that there could be:
impl From<*mut T> for Allocation {
fn from(ptr: *mut T) -> Self {
Allocation::new(ptr as *mut Opaque, Layout::new::<T>())
}
}
I'm not particularly convinced by ReallocErr
.