rust-lang/rust

Tracking issue for the GlobalAlloc trait and related APIs

SimonSapin opened this issue · 148 comments

PR #49669 adds a GlobalAlloc trait, separate from Alloc. This issue track the stabilization of this trait and some related APIs, to provide the ability to change the global allocator, as well as allocating memory without abusing Vec::with_capacity + mem::forget.

Defined in or reexported from the std::alloc module:

Update: below is the API proposed when this issue was first opened. The one being stabilized is at #49668 (comment).

/// #[global_allocator] can only be applied to a `static` item that implements this trait
pub unsafe trait GlobalAlloc {
    unsafe fn alloc(&self, layout: Layout) -> *mut Opaque;
    unsafe fn dealloc(&self, ptr: *mut Opaque, layout: Layout);

    unsafe fn alloc_zeroed(&self, layout: Layout) -> *mut Opaque { 
        // Default impl: self.alloc() and ptr::write_bytes()
    }
    unsafe fn realloc(&self, ptr: *mut Opaque, layout: Layout, new_size: usize) -> *mut Opaque {
        // Default impl: self.alloc() and ptr::copy_nonoverlapping() and self.dealloc()
    }
    
    // More methods with default impls may be added in the future
}

extern {
    pub type Opaque;
}

#[derive(Debug, Copy, Clone, PartialEq, Eq)]
pub struct Layout { /* private */ }

impl Layout {
    pub fn from_size_align(size: usize: align: usize) -> Result<Self, LayoutError> {}
    pub unsafe fn from_size_align_unchecked(size: usize: align: usize) -> Self {}
    pub fn size(&self) -> usize {}
    pub fn align(&self) -> usize {}
    pub fn new<T>() -> Self {}
    pub fn for_value<T: ?Sized>(t: &T) -> Self {}
}

#[derive(Copy, Clone, Debug)]
pub struct LayoutError { /* private */ }

/// Forwards method calls to the `static` item registered
/// with `#[global_allocator]` if any, or the operating system’s default.
pub struct Global;

/// The operating system’s default allocator.
pub struct System;

impl GlobalAlloc for Global {}
impl GlobalAlloc for System {}

CC @rust-lang/libs, @glandium

Unresolved questions or otherwise blocking

  • Wait for extern types #43467 to be stable in the language before stabilazing one in the standard library.
  • Name of the Global type. GlobalAllocator?
  • Name of the Void type. Lower-case void would match C (our *mut void is very close to C’s void* type), but violates the convension of camel-case for non-primitive types. But Void in camel-case is already the name of an empty enum (not an extern type like here) in a popular crate. (An extern type is desirable so that <*mut _>::offset cannot be called without first casting to another pointer type.) Maybe call it Opaque? Unknown? Renamed to Opaque.
    • Rename again to something else?
  • Taking Layout by reference, or making it Copy #48458. Copy: #50109
  • GlobalAlloc::owns: #44302 proposes making it required for allocators to be able to tell if it “owns” a given pointer. Not to be required by this trait since glibc and Windows do not support this.
  • Settle semantics of layout "fit" and other safety conditions. Without an usable_size method (for now), the layout passed to dealloc must be the same as was passed to alloc. (Same as with Alloc::usable_size’s default impl returning (layout.size(), layout.size()).)
  • An oom method is part of GlobalAlloc so that implementation-specific printing to stderr does not need to be part of liballoc and instead goes through #[global_allocator], but this does not really belong in the GlobalAlloc trait. Should another mechanism like #[global_allocator] be added to have pluggable OOM handling? #50144

Not proposed (yet) for stabilization

  • The Alloc trait
  • The rest of Layout methods
  • The AllocErr type

We’re not ready to settle on a design for collections generic over the allocation type. Discussion of this use case should continue on the tracking issue for the Alloc trait: #32838.

I was initially confused by the name Void. I confused it with the aforementioned empty enum. Bikeshed: I wonder if it could be called Allocation or Memory?

Relevant discussions pre-PR:

But I don’t know if any platform’s system allocator (like libc’s malloc) supports this, let alone all of them.

The OSX system allocator supports that. Jemalloc does too, optionally. Glibc malloc and Windows heap allocator don't.

Taking Layout by reference, or making it Copy #48458

The fact that realloc now takes the new size as usize rather than a Layout strongly points towards just making Layout implement Copy and treating it as a simple (length, alignment) pair.

Has the Heap type been renamed to Global?

@alexreg Yes, as described in the PR message of #49669.

Naming bikesheds:

Given that the attribute is global_allocator, should the trait be called GlobalAllocator instead? alloc vs. allocator is inconsistent.

Also, I am not very happy with Void. void is probably the most misunderstood type in C. It is often described as "empty", but it actually is a unit type (), not the empty never type !. void* is NOT just a pointer to () though, but an independent concept. I'd rather we do not step into the middle of this confusion.
From the alternatives proposed above, I prefer Unknown. Some more random suggestions: Blob, Extern, Data. I think from all of these, my personal favorite is Blob, which I have often seen used as a term for "not further specified binary data". But really, any of these is strictly better than Void IMHO.

If GlobalAlloc is renamed to GlobalAllocator, should Alloc be Allocator? And the alloc method allocate? (And similarly other methods.)

Agreed that Void or void is probably best avoided, though I’m not sure what to pick instead.

I would personally keep the name "alloc" for traits and methods, the global_allocator is talking about a specific instance whereas the methods/traits are actions/groups of types.

std::alloc::Global could also be something like GlobalHeap (I think suggested by @glandium?) or DefaultHeap or DefaultAlloc. I don't think we can choose the name GlobalAlloc without renaming the trait GlobalAlloc itself.

I don't think we should implement owns. The sheer fact that two tier-1 platforms don't implement it I feel like is a nail in that coffin.

The GlobalAlloc trait should require Sync as a super trait, no?

Yeah, Send as well.

I don't think Send is actually necessary. A GlobalAlloc is never dropped and does not require a &mut.

My rule of thumb for Send is: "is it OK to drop this type in a different thread than the one it was created in?"

@fitzgen @sfackler I don't think either trait is necessarily required per se though, inherently sticking it into a static will require both traits for sure. I think the main benefit may be being informative early on, but we've tended to shy away from those types of trait bounds.

As Alex said, Sync is already required by static items which are the only ones where #[global_allocator] can be used.

@alexcrichton

sticking it into a static will require both traits for sure.

That should only require Sync, as @SimonSapin said, but not Send, right? Only &T is shared across threads.

from_size_align_unchecked is not an unsafe method? Are there any invariants that Layout upholds?

Why Void and not fn alloc<T>() -> *mut T?

from_size_align_unchecked is unsafe, that was a mistake. See https://doc.rust-lang.org/nightly/core/heap/struct.Layout.html#method.from_size_align for invariants.

alloc_one<T>() is a convenience method on top of alloc(Layout), you can’t always statically construct a type that has the desired layout.

Every time I see this trait name I get it confused with GlobalAlloc the Windows API function: https://msdn.microsoft.com/en-us/library/windows/desktop/aa366574

@retep998 What do you think would be a better name for the trait?

[Reposting in the right tracking issue this time...]

So, the name currently decided on by @SimonSapin for the allocated memory type seems to be Opaque. I concur that Void is a dubious name, but I don't think Opaque is great, mainly due to its complete lack of descriptiveness. I propose one of the following:

  • Blob
  • Mem
  • MemBlob
  • MemChunk
  • OpaqueMem – not my favourite, but at least slightly more explicit than just Opaque
    I'd probably lean towards Mem, since it's the most pithy, but the others are okay too.

I would add Raw to the list, since allocation returns raw untyped memory.

RawMem would work too, yeah. Just Raw is too vague though.

What word does OED define as the following perfect description of our use case?

Used to emphasize a lack of restriction in referring to any thing or amount, no matter what.

answer

*mut Whatever

This is a serious suggestion by the way.
extern {
    fn memcpy(dest: *mut Whatever, src: *const Whatever, n: size_t);
}

unsafe impl GlobalAlloc for BetterAllocator {
    unsafe fn alloc(&self, layout: Layout) -> *mut Whatever {
        /* ... */ as *mut Whatever
    }
}

It’s not a regular noun though. (An interrogative/relative pronoun yes, but those are a bit different.)

What is the caller going to store in this memory that I allocated for them?

Well, whatever they want.

What type of data am I copying with this memcpy?

I don't know, ¯\_(ツ)_/¯ whatever happens to be there I guess.

IMO this captures the essence of void *.

@rfcbot fcp merge

The global allocator APIs and associated infrastructure have been around for a very long time at this point and I think that it's high-time to propose stabilization. With @SimonSapin's recent PR to refactor out a GlobalAlloc trait I'd like to FCP to stabilize just that trait and associated items (plus a conservative API on Layout)

Team member @alexcrichton has proposed to merge this. The next step is review by the rest of the tagged teams:

No concerns currently listed.

Once a majority of reviewers approve (and none object), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

As for the name of Opaque, I'm hoping we can decide during stabilization. I'm personally currently in favor of *mut Whatever

There could be some more bikeshedding on the Global name too. (c32a3de#commitcomment-28405600)

I’d rather stick with Opaque than go with Whatever. Both are too vague, but especially Whatever, which is really nebulous.

As motivated in #49668 (comment), *mut Whatever conveys a pointer that points to whatever data happens to exist there, of whatever type. One can allocate memory intended to hold whatever the caller wants there, one may copy whatever data from one pointer to another, etc.

While I’m all for keeping momentum, there’s still a number of open issues to resolve before stabilization. (The semantics of layout fit is a subtle one that I don’t quite understand.)

One open question that came up during the all-hands is Layout::for_value. This does not need to actually dereference the pointer, so maybe it should be changed to a raw pointer. The trouble with making it a reference is that references (maybe?) assert that the value they point to is valid, while for_value is called with a reference to a dead value e.g. by Arc::drop.

Cc @nikomatsakis

@RalfJung It calls std::mem::size_of_val which takes &T

@SimonSapin Good point. Probably we should have raw pointer versions of that (and align_of_val).

@dtolnay Okay, I see your point. The problem for me is still that Whatever doesn’t really work as a noun.

I would like to repeat @alexreg's suggestion of RawMem. Either that or keep Opaque, I think that Whatever feels too informal.

@SimonSapin indeed yeah there are some unresolved questions, but I'd consider them largely bikesheds by this point. (I also see our bikeshed for this as more of an airplane hangar it's got so many coats of paint)

  • extern type - I am personally mostly a fan of *mut u8, but I realize that this isn't popular. If extern type Foo doesn't work out though I think we'll be forced to switch to *mut u8 to get .offset working in terms of bytes. I wouldn't see this as the end of the world.
  • Global naming - a bikeshed!
  • "fit" semantics - if we're not stabilizing usable_size, I believe this is settled? That is, you must deallocate and reallocate with precisely the same size and alignment. (alignment has our hand forced by Windows I believe)
  • OOM methods - we discussed this a bit on IRC but I think that we'll want to stabilize oom here.

Ok, sounds good. (FWIW I’m also fine with *mut u8.)

@alexcrichton why are we stabilizing GlobalAlloc::oom? I still do not understand why the allocator is the thing that decides how OOM is treated.

OOM in general to me feels pretty closely coupled with the global allocator (in that you often want to customize both). On a technical level it's pretty awkward for us to move OOM handling off the GlobalAlloc trait and if we were to not stabilize it then customizing OOM wouldn't be possible and you also wouldn't even get a nice message saying that OOM happened.

They're still on separable axes in that only the final crate decides the global allocator, and it's up to that crate to pick-and-choose implementations if it likes.

It's true you often want to customize both but they're totally orthogonal customizations. If they're conflated, what does that actually look like? Are people expected to make their own GlobalAlloc implementation that wraps another and delegates all methods except for oom? How would System::oom and Jemalloc::oom differ?

Yes if you want to customize OOM and not the allocator you'd have to make your own implementation of the trait and dispatch all methods except one. That, however, doesn't seem like that much overhead to me (it's not really that common to override OOM?). I would imagine Jemalloc::oom is the same as System::oom, but we have to make sure that it's possible to define it as that.

So it seems like we're trying to hitch OOM customization to gobal allocator customization because we happen to already have the glue in place for it. What is the specific driver for stabilizing it in the first place? Is there concrete demand for customizing OOM? The "delegate n-1 methods" workflow seems really bad.

We somehow need to stablize when you're using jemalloc that OOM prints out a nice message like the system allocator would by default. Currently we're not really in a position to remove OOM handling from the trait to decouple oom handling from the GlobalAlloc trait, which in my mind forces our hand on stabilizing GlobalAlloc::oom.

If we can figure out how to move the method off the trait and a good way and have custom allocators by default route to "print an OOM message" then we should be able to avoid stabilizing it for sure.

That being said, I don't really see the erognomic downside here as really that bad, it's something you do vanishing rarely and we're about enabling workflows rather than making the workflows perfect right now.

I feel like I must be missing something because this doesn't seem like a hard problem to solve:

Option 1: Replace the code in liballoc that calls the_global_allocator.oom() with code that calls rtabort!("out of memory") or whatever System::oom does.
Option 2: Keep GlobalAlloc::oom, give it a default impl (which every implementation will want anyway, right?), and leave it unstable.

@sfackler Keep in mind that liballoc can't depend on any crates other than core, so the only implementation of oom that liballoc could provide would be to call intrinsics::abort. This isn't very user-friendly since it just crashes the program with an illegal instruction.

There are only really 2 options to provide a user-friendly OOM message:

  • A trait method in GlobalAlloc, which allows allocator implementations to call libc to print a message and abort.
  • A lang item, which allows libstd to provide an OOM handler. The downside of this is that it adds yet another lang item that we will need to stabilize later on to allow no_std programs to use liballoc (since liballoc relies on this lang item existing).

As far as the workflow being bad, if I want to pick what happens on OOM, I want to write something like

#[oom_handler]
fn oom() -> ! {
    // log stuff!
}

not

use alloc::{Layout, GlobalAlloc, Opaque};
use system_alloc::SystemAllocator;

#[global_allocator]
static _ALLOC: OomHandlingAllocator<SystemAllocator> = OomHandlingAllocator(SystemAllocator);

struct OomHandlingAllocator<A>(A);

impl<A> GlobalAlloc for OomHandlingAllocator<A>
where
    A: GlobalAlloc
{
    unsafe fn alloc(&self, layout: Layout) -> *mut Opaque {
        self.0.alloc(layout)
    }

    unsafe fn dealloc(&self, ptr: *mut Opaque, layout: Layout) {
        self.0.dealloc(ptr, layout)
    }

    unsafe fn alloc_zeroed(&self, layout: Layout) -> *mut Opaque {
        self.0.alloc_zeroed(layout);
    }

    unsafe fn realloc(&self, ptr: *mut Opaque, layout: Layout, new_size: usize) -> *mut Opaque {
        self.0.realloc(ptr, layout, new_size)
    }

    // hopefully I didn't forget to delegate anything...

    fn oom(&self) -> ! {
        // log stuff!
    }
}

We should put the work into making this reasonable.

OOM and the global allocator seem to be two examples of "compiler-endorsed singletons" ("stable lang-items"?) or whatever one may want to call them; I think I've seen this brought up elsewhere where people wanted to use a similar mechanism for something else. Maybe it makes more sense to make them both instances of the same mechanism, rather than bolting them together in one trait?

@sfackler as @Amanieu mentioned unfortunately we can't tweak the definiton as-is today. The GlobalAlloc trait is defined in core which has no access to primitives like write, so it can't define an implementation that either delegates to or implements the "reasonable expectation" by default. (but libstd can do this).

We could perhaps move GlobalAlloc but we could only move it up as far as alloc which as the same issues, it doesn't have access to system resources as it purely assumes global allocation.

We could, however, use an entirely new mechanism for defining OOM handlers like panic handlers and such, but I really do think that's overkill. What you mentioned is indeed not super ergonomic, but I don't think it's something we must fix. Your example could alternatively be written as:

extern crate some_crates_io_crate;

use std::alloc::System;
use some_crates_io_crate::{CustomOom, Oom};

#[global_allocator]
static ALLOC: CustomOom<System, MyOom> = CustomOom(System, MyOom);

struct MyOom;

impl Oom for MyOom {
    fn oom() -> ! { ... }
}

I don't understand what you mean by overkill.

The amount of compiler support necesary for custom panic implementations, custom global allocators, custom panic runtimes, etc, is quite large and difficult to maintain. Adding "yet another one" is overkill in my mind (as I've implemented all of these in the compiler so far) compared to just having a crate on crates.io that allows customizing just OOM

oom seems very similar to panic_fmt - why would it not be able to use the same logic as that but be pulled in through liballoc rather than libcore?

I think OOM is actually very similar to panic in the kinds of ways you'd want to control it. Setting something like this statically at build time is the least flexible way of doing things, and I think we'd want to end up in a place in std where we have something like std::panic::set_hook but for OOM. This allows you more control to easily change your OOM behavior based on runtime config.

@sfackler yeah we can concoct anything we want to design an OOM handler into the system, my point is that I feel like it's all overkill. We've got a workable solution today and it's not clear to me why we'd want to strong optimize differently for various use cases.

It's always possible, yes, for alloc to have another "weak lang item" defined in libstd, and then libstd has infrastructure for saying "here's the real oom handler". That comes at the downside of adding another assumption to alloc (an OOM handler and a global allocator).

How would System::oom and Jemalloc::oom differ?

Today, jemallocator::Jemalloc::oom is defined as a call to std::alloc::System::oom.

Is there concrete demand for customizing OOM?

We have an accepted RFC #48043 for some way to opt into OOM causing a panic rather than an abort. The exact mechanism is not defined by the RFC and left up to the implementation, but this is not implemented yet. fn oom() -> ! (whether it’s a method of #[global_allocator] or a free function with its own similar attribute) could be that mechanism.

🔔 This is now entering its final comment period, as per the review above. 🔔

We've got a workable solution today and it's not clear to me why we'd want to strong optimize differently for various use cases.

My point is that the current solution is bad API design. The fact that jemallocator depends on alloc_system just so that it doesn't need to copy all of this stuff is bad: https://github.com/rust-lang/rust/blob/master/src/liballoc_system/lib.rs#L370-L428. Dlmalloc just doesn't implement oom, not really sure what the deal is there: https://github.com/alexcrichton/dlmalloc-rs/blob/c99638dc2ecfc750cc1656f6edb2bd062c1e0981/src/global.rs#L123-L125.

The GlobalAlloc trait should be responsible for allocating memory, not allocating memory and figuring out how to print an error message on OOM. When someone in the future asks "why is OOM behavior configured through the global allocator?", what is the answer other than "we didn't have the energy to make a lang item and it's a convenient place to stick it"?

I am willing to do the work of making a #[lang="oom"]. It's another lang item you need to define if you're working in alloc-but-no_std-land, but I think that is a reasonable price to pay.

@sfackler thanks for doing the legwork to turn it into a lang item!

Now that #50144 has moved OOM to its own lang item, this may not be the right place to talk about this, so please redirect me to the right place if not.

In Firefox, we have the issue that we need to differentiate OOM crashes from other crashes, and we also need to collect the requested size that led to OOM when it happens. The problem is that the GlobalAlloc changes removed the oom function argument that could have allowed that, and the new oom lang item has kept the new signature.

The alternatives are not great:

  • Moving OOM handling to the allocator function would mean turning it to an infallible allocator function, meaning code paths that actually want to handle allocation failures gracefully can't anymore.
  • Making the allocator record the size of allocations that fail, so that the OOM handler can check it out afterwards is not reliable, because a caller may handle the failure gracefully, and you'd have to reset the oom record for a subsequent crash not to look like an oom. It would be possible, though, to store the oom record in a thread local variable, and then use that thread local variable from the oom handler to feed the crash reporter. That seems convoluted. It could work in Firefox, because we control the allocator as well (although that's not true for ASAN builds), but that's not a general purpose solution.

How does the C side of the codebase deal with this?

The C side has different function calls for fallible and infallible allocations. Fallible allocations return NULL when they fail, and infallible allocations panic and handle recording the OOM requested size.

How would the old oom function have supported what you want, though? It's still a separate function that's called after the function that failed, so you'd still need thread local tracking of what went wrong.

Since you control the allocator, you could have the allocation functions directly abort like the C versions do?

How would the old oom function have supported what you want, though? It's still a separate function that's called after the function that failed, so you'd still need thread local tracking of what went wrong.

The old oom function was taking an AllocErr, which contained the requested Layout.

Since you control the allocator, you could have the allocation functions directly abort like the C versions do?

As I said, that would mean making all rust allocations infallible, even for code that does want to handle allocation failure.

Ah sure - the AllocErr parameter was removed a bit before it turned into a separate function and I had forgotten about that.

It seems reasonable to me for oom take a Layout as a parameter.

Shall I just go ahead with a PR doing that?

Sure!

Should it be Layout or just usize for the allocation size?

Since checking for null and calling oom(…) might be a very common pattern, it might make sense to provide convenience API(s) for it. But to avoid users having to pass the layout (or size) twice, it’d have to be a whole set of parallel APIs for alloc, alloc_zeroed, and realloc. With Alloc::*_excess, it’s now two axes where we potentially want all combinations.

Considering alloc might fail because of wrong alignment requirements, an oom handler may want to check that too.

(Thank you for reminding me that I wanted to add alloc_zeroed_excess)

So the downside is that, with AllocErr now being a ZST, there are places in raw_vec where it's actually not convenient to pass a Layout to oom:

Err(_) => oom(),

Err(_) => oom(),

Err(AllocErr) => oom(),

Err(AllocErr) => oom(),

The final comment period is now complete.

Another problem with the oom lang item: it doesn't allow to override the one from std:

error[E0152]: duplicate lang item found: `oom`.
   = note: first defined in crate `std`.

Presumably, this would be fixed by making it an actual weak lang item.

Continuation of my earlier comment about passing Layout to oom being painful, it's particularly painful in

Err(AllocErr) => oom(),

where the patch looks like:

@@ -552,7 +553,7 @@ impl<T, A: Alloc> RawVec<T, A> {
     pub fn reserve(&mut self, used_cap: usize, needed_extra_cap: usize) {
         match self.try_reserve(used_cap, needed_extra_cap) {
             Err(CapacityOverflow) => capacity_overflow(),
-            Err(AllocErr) => oom(),
+            Err(AllocErr) => oom(Layout::array::<T>(self.amortized_new_size(used_cap, needed_extra_cap).unwrap()).unwrap()),
             Ok(()) => { /* yay */ }
          }
      }

It ends up similarly painful in

Err(CollectionAllocErr::AllocErr) => oom(),

Err(CollectionAllocErr::AllocErr) => oom(),

Err(CollectionAllocErr::AllocErr) => oom(),

where the patch similarly needs to call a function.

Actually, the ones for collections are just plain horrible, especially the one for HashMap, which I haven't implemented because of the error-proneness. At this point, I'm tempted to say infallibility should be handled by the allocator itself. Adding methods is not incredibly attractive, so maybe a parameter to the alloc functions?

Another problem with the oom lang item: it doesn't allow to override the one from std:

The thing to do here is to make an API like std::panic::set_hook. It is already a weak lang item.

@glandium want to open a separate tracking issue for customizing OOM behavior? (through an API like @sfackler is thinking)

set_hook or a lang item – either sounds like a good way to handle OOM for the GlobalAlloc. Is there any reason for preferring one over the other?

  1. Allowing people to configure something at runtime is strictly more flexible than only allowing them to configure it at compile time.
  2. Lang items can only be defined once, and std defines the oom lang item.

Oh, I didn't know about restriction 2. As for 1, it can be circumvented easily as you have a lang item. :-)

I noticed that the compiler will accept both a normal and an extern function for the oom lang item:

// This is how rust_oom is defined in std
#[lang = "oom"]
extern fn rust_oom() -> ! { loop {} }

// The compiler also accepts this
#[lang = "oom"]
fn rust_oom() -> ! { loop {} }

Is this intentional? Does rustc automatically pick the correct ABI for the lang item?

OTOH, global_allocator is set at compile time, and you get one by default if you don't set one yourself. Why should oom handling be different?

I don't understand. Why would we want to be less flexible here?

Why would you not want global_allocator to be similarly more flexible, then?

Anyways, this discussion also doesn't address the problem I raised in my last comments, where trying to pass a layout to the oom function is difficult in some places. As mentioned by @SimonSapin this can be addressed by adding helper functions to the Alloc trait, but the multiplication of functions is not helping.
I'm now thinking of something like:

struct Infallible<A: Alloc>(A);
impl<A: Alloc> Alloc for Infallible<A> { ... }

I was thinking that the infallible APIs could maybe return NonNull, but that only works if they’re separate APIs.

Why would you not want global_allocator to be similarly more flexible, then?

Because we can't. panic::set_hook requires allocating, and the runtime allocates before main.

Because we can't. panic::set_hook requires allocating, and the runtime allocates before main.

That's a fair reason. OTOH, I'm not convinced the flexibility for oom would be any useful. And "because lang items can be defined only once" doesn't seem like a compelling reason, when, as mentioned earlier, the problem doesn't exist for #[global_allocator]. IOW, I don't see why oom can't be handled more similarly to #[global_allocator].

(or #[panic_implementation], for that matter)

@Amanieu lang items aren't type checked.

@glandium It is way easier to implement a panic-hook style interface than another thing like global_allocator. panic_implementation behaves identically to oom.

I was thinking that the infallible APIs could maybe return NonNull, but that only works if they’re separate APIs.

Callers are not calling directly the GlobalAlloc trait methods, though. And the Alloc methods already return NonNull, albeit in a Result with an AllocErr. It would sure be better if an infallible API returned ! instead of AllocErr. In the Infallible<A> scheme I suggested above, that could be done by adding an Error associated type to the Alloc trait.

Something like panic_hook requires dynamic dispatch, which is not a cost we wanted to pay for allocators. Panics are the extra-slow path, like OOM, so don't need to be optimized at all.

So I gave Infallible<A> a shot, and while it does help with many places calling oom, it doesn't help with all the places involving CollectionAllocErr. This would be equally true with infallible methods rather than Infallible<A>. It seems, though, it could be reasonable to add the Layout to CollectionAllocErr::AllocErr, although that would kill From<AllocErr> for CollectionAllocErr.

Ok, I have a working PoC for Infallible<A> and oom(Layout), and I haven't touched anything related to GlobalAlloc, so I'll move the discussion to #48043 and #32838 and will file a separate issue for OOM itself.

Refocusing on GlobalAlloc itself, there doesn't seem to have been a conclusion on the bikeshedding.

cc @SimonSapin

I just went through #49669 which does not offer much rationale about some of the changes but points to this issue as the correct place for discussion, and also mentions the All-Hands but I wasn't there (maybe someone that was there can answer my questions).

It looks like #49669 takes many steps in the opposite direction that I at least thought that the allocator design was going, but I don't see these issues being mentioned here, and they are IMO fundamental.

I thought that the plan was to:

  • have one crate per allocator
  • have each allocator implement the traits themselves (GlobalAlloc, Alloc, etc.) so that they can provide specializations when available
  • eventually replace the allocator crates in rust-lang/rust like liballoc_jemalloc with the rust-lang-nursery crates like jemallocator, so that users that choose to include a separate allocator crate get the same allocator that they would have gotten when this was used through rust. For example, a crate compiled with system malloc as global allocator that wants to use Rust jemalloc for a particular thing could just include jemallocator and get the rust version of jemalloc

Before #49669 we were already at the step where we had the different allocator crates implementing their own traits, so I thought the next steps towards the allocator system structure would be to just replace them with submodules pointing to the crates in the nursery.

However, #49669 changes all these crates to not implement any allocator traits anymore, instead implementing GlobalAlloc for the allocators in liballoc/alloc.rs and implementing Alloc on top of the more restricted GlobalAlloc API.

This prevents allocators from providing more efficient implementations of the trait methods. For example, liballoc_jemalloc used to provide more efficient implementations of Alloc:shrink_in_place and Alloc::grow_in_place using xallocx, but this is no longer the case. Also, Alloc::usable_size used to return the correct available size for jemalloc but now it returns the size that was requested, which I'd consider a bug.

  • So why were these changes made?
  • Is it still a plan to replace the rust-lang/rust allocators with the ones in the nursery? If not, why not?
  • Is it still a plan to allow allocators, including the global one, to provide more specialized implementations of the GlobalAlloc and the Alloc traits? If not why not? And if yes, why did that PR introduce this regression?

have one crate per allocator

Once upon a time, customizing the global allocator was done through an "allocator crate" with some dedicated crate-level attributes (and I think some free functions expected to be at the root?) Maybe you’re thinking of this?

Now we have the #[global_allocator] attribute that goes on a static item whose type implements GlobalAlloc, but that type can be defined anywhere. Crates are not special there, the same crate could have multiple types that implement that trait.

have each allocator implement the traits themselves (GlobalAlloc, Alloc, etc.)

I don’t think there ever was a requirement that any type implements both traits. jemallocator does for convenience of users.

get the rust version of jemalloc

When System becomes the default, I expect the alloc_jemalloc crate to be removed and I’d say there will no longer be a "rust version of jemalloc". Unless you consider the nursery to be part of "the rust project" but I view it more as crates that happen to be maintained by some of the same people as std.

I thought the next steps towards the allocator system structure would be to just replace them with submodules pointing to the crates in the nursery

As far as I know there never was a plan for jemallocator to be a sumbodule of this repository and be distributed together with std. Rather, users can opt to add it to their [dependencies] in Cargo.toml, and define themselves a static with #[global_allocator].

Alloc:shrink_in_place, Alloc::grow_in_place, Alloc::usable_size

These methods are not part of the initial stabilization:

The high level idea that lead to #49669 is that we want to stabilize #[global_allocator] and the trait that supports it, but we’re not ready to stabilize the Alloc trait. From in-person discussion (sorry if I didn’t explain the reasoning enough here) we settled on having a separate trait with only "libc-equivalent functionality", minimizing the API surface in order to minimize the number of blockers to initial stabilization.

allow allocators, including the global one, to provide more specialized implementations of the GlobalAlloc and the Alloc traits?

jemallocator already does.

@SimonSapin thanks for taking the time to go over all that on IRC.

To summarize my thoughts. I think that after understanding the next steps of unifying __rde_/__rdl_, removing liballoc_jemalloc, ... the general direction looks to me to be the right one.

The only thing I think might be better to do differently is the implementation details of the liballoc_system crate. This is a nitpick, but on some platforms like FreeBSD, NetBSD, and newer Android versions, jemalloc is the system allocator, and I think it would be cool if in these systems liballoc_system would have feature parity with the jemallocator crate. For example, I'd expect that in these systems the specializations of the Alloc trait for jemalloc, that are available in the jemallocator crate, are also implemented in the impl of Alloc for the System allocator and for the Global allocator when Global == System.

Yeah, seems reasonable for alloc_system to become more tuned for individual platforms over time.

To summarize a bit the discussion in internals, these are the changes that I would do to the global allocator trait. It is a fallible API that always returns the "Excess" and would look like this (where Excess is now called Allocation):

/// Represents a memory allocation, that is, a combination of a starting address
/// and the allocation layout.
pub struct Allocation(*mut Opaque, Layout);

impl Allocation {
    /// Instantiates a memory allocation from a pointer and a layout.
    pub fn new(ptr: *mut Opaque, layout: Layout) -> Self;
    /// Starting address of the memory allocation.
    pub fn ptr(&self) -> *mut Opaque; 
    /// Layout of the allocation.
    pub fn layout(&self) -> Layout;
}

/// The `AllocErr` error specifies that an allocation failed.
///
/// This can only happen due to resource exhaustion or because the allocator
/// does not support the provided `Layout`.
#[derive(Clone, PartialEq, Eq, Debug)]
pub struct AllocErr(Layout);

impl AllocErr {
    /// Layout that produced the allocation to fail.
    pub fn layout() -> Layout;
}

/// Reallocation error.
///
/// This can happen either due to resource exhaustion, because the allocator
/// does not support the provided `Layout`, if the `Allocation` was allocated
/// with a differen allocator, etc.
///
/// It contains the `AllocErr` containing the provided `Layout` and the
/// `Allocation` that failed to reallocate.
pub struct ReallocErr(AllocErr, Allocation);

impl Realloc {
    /// Layout of the reallocation request.
    fn layout_request(&self) -> Layout;
    /// Allocation that failed to reallocate.
    fn allocation(self) -> Allocation;
}

/// A memory allocator that can be registered to be the one backing `std::alloc::Global`
/// though the `#[global_allocator]` attributes.
pub unsafe trait GlobalAlloc {
    /// Allocates memory as described by the given `layout`.
    fn alloc(&self, layout: Layout) -> Result<Allocation, AllocErr>;

    /// Deallocate the memory `allocation`.
    ///
    /// # Safety
    ///
    /// The `allocation` pointer must come from this allocator and its size 
    /// and alignment must be either the requested or returned.  
    unsafe fn dealloc(&self, allocation: Allocation);

    /// Allocates zeroed memory as described by the given `layout`.
    fn alloc_zeroed(&self, layout: Layout) -> Result<Allocation, AllocErr>;

    /// Shink or grow a memory `allocation` to the given `new_layout`.
    /// # Safety
    ///
    /// The `allocation` pointer must come from this allocator and its size 
    /// and alignment must be either the requested or returned.  
    unsafe fn realloc(&self, allocation: Allocation,
                      new_layout: Layout) -> Result<Allocation, ReallocErr>;
}

A couple of notes on the changes:

  • Excess is renamed to Allocation and contains a pointer to the allocation and a layout. The pointer can be null, and the layout can have size zero. As it was the case, if that is a valid the allocation, calling deallocate with that allocation should succeed.
  • alloc and alloc_zeroed are not unsafe fns. They either succeed or return an error. If an allocator implementation returns Ok then that's a valid allocation that can be passed to dealloca and realloc.
  • functions taking an Allocation are still unsafe because that Allocation could come from a different allocator, and thus cause undefined behavior. It is a precondition on these methods that the Allocations provided come from calling one of the allocation methods of these allocators.
  • Because these functions consume the allocation, and Allocation is neither Copy nor Clone, double frees cannot happen without the user using unsafe code to duplicate an Allocation. realloc consumes the allocation and, if there was an error, returns it.
  • The wording of # Safety needs to be tuned such that collections like Vec and Box can recompute the Allocation from the capacity and alignment of the type, where the capacity might be the size requested, or the size returned, and the same applies to the alignment. That is, recomputing an Allocation from the pointer, the size returned, and the alignment requested, should work out.

I can try to send a working implementation of this as a PR over the weekend, that way we can see how much does this change the design, and do a perf run to see the impact on performance. I expect that suitable #[inline] reduce the impact to close to zero when jemalloc is not used, since the only thing we would do is access the pointer and drop the Layout.

Since AllocErr now contains the layout, it should be possible to just pass it to oom without contortions @glandium . An alternative is to provide an oom() method to AllocErr and ReallocErr that do this for you. The only thing we would need to allow this AFAICT is a Display impl for Layout.

From the top of my head, at quick glance:

Using a pointer instead of NonNull makes Results larger than they should be. Also, how can a null pointer be an Ok result when there's AllocErr? Are callers supposed to check both now? There's still the question whether a zero-size Layout should be valid or not, both for codegen reasons (see last paragraph of the first message in https://internals.rust-lang.org/t/pre-rfc-changing-the-alloc-trait/7487) and consistency reasons (many things assume ZSTs have an address in the null page, allowing the allocator to whatever it pleases with that sounds dangerous).

Returning the layout as part of the AllocErr is actually making things awful from a codegen perspective because to use that value to call oom (the only reason it's there would be to feed it there) means things need to be moved between registers. See #32838 (comment) (beware, it's in the "hidden items"). Using the Layout the caller already has is actually better in that regard, and I have a PR underway that deals with the oom thing without too much contortion already, I just needed to adjust some methods to make things easier.

One thing that is nice-ish about Allocation is that there could be:

impl From<*mut T> for Allocation {
    fn from(ptr: *mut T) -> Self {
        Allocation::new(ptr as *mut Opaque, Layout::new::<T>())
    }
}

I'm not particularly convinced by ReallocErr.