Allocator traits and std::heap
nikomatsakis opened this issue · 471 comments
📢 This feature has a dedicated working group, please direct comments and concerns to the working group's repo.
The remainder of this post is no longer an accurate summary of the current state; see that dedicated working group instead.
Old content
Original Post:
FCP proposal: #32838 (comment)
FCP checkboxes: #32838 (comment)
Tracking issue for rust-lang/rfcs#1398 and the std::heap
module.
- land
struct Layout
,trait Allocator
, and default implementations inalloc
crate (#42313) - decide where parts should live (e.g. default impls has dependency on
alloc
crate, butLayout
/Allocator
could be inlibcore
...) (#42313) - fixme from source code: audit default implementations (in
Layout
for overflow errors, (potentially switching to overflowing_add and overflowing_mul as necessary). - decide if
realloc_in_place
should be replaced withgrow_in_place
andshrink_in_place
(comment) (#42313) - review arguments for/against associated error type (see subthread here)
- determine what the requirements are on the alignment provided to
fn dealloc
. (See discussion on allocator rfc and global allocator rfc and traitAlloc
PR.)- Is it required to deallocate with the exact
align
that you allocate with? Concerns have been raised that allocators like jemalloc don't require this, and it's difficult to envision an allocator that does require this. (more discussion). @ruuda and @rkruppe look like they've got the most thoughts so far on this.
- Is it required to deallocate with the exact
- should
AllocErr
beError
instead? (comment) - Is it required to deallocate with the exact size that you allocate with? With the
usable_size
business we may wish to allow, for example, that you if you allocate with(size, align)
you must deallocate with a size somewhere in the range ofsize...usable_size(size, align)
. It appears that jemalloc is totally ok with this (doesn't require you to deallocate with a precisesize
you allocate with) and this would also allowVec
to naturally take advantage of the excess capacity jemalloc gives it when it does an allocation. (although actually doing this is also somewhat orthogonal to this decision, we're just empoweringVec
). So far @gankro has most of the thoughts on this. (@alexcrichton believes this was settled in #42313 due to the definition of "fits") - similar to previous question: Is it required to deallocate with the exact alignment that you allocated with? (See comment from 5 June 2017)
- OSX/
alloc_system
is buggy on huge alignments (e.g. an align of1 << 32
) #30170 #43217 - should
Layout
provide afn stride(&self)
method? (See also rust-lang/rfcs#1397, #17027 ) -
Allocator::owns
as a method? #44302
State of std::heap
after #42313:
pub struct Layout { /* ... */ }
impl Layout {
pub fn new<T>() -> Self;
pub fn for_value<T: ?Sized>(t: &T) -> Self;
pub fn array<T>(n: usize) -> Option<Self>;
pub fn from_size_align(size: usize, align: usize) -> Option<Layout>;
pub unsafe fn from_size_align_unchecked(size: usize, align: usize) -> Layout;
pub fn size(&self) -> usize;
pub fn align(&self) -> usize;
pub fn align_to(&self, align: usize) -> Self;
pub fn padding_needed_for(&self, align: usize) -> usize;
pub fn repeat(&self, n: usize) -> Option<(Self, usize)>;
pub fn extend(&self, next: Self) -> Option<(Self, usize)>;
pub fn repeat_packed(&self, n: usize) -> Option<Self>;
pub fn extend_packed(&self, next: Self) -> Option<(Self, usize)>;
}
pub enum AllocErr {
Exhausted { request: Layout },
Unsupported { details: &'static str },
}
impl AllocErr {
pub fn invalid_input(details: &'static str) -> Self;
pub fn is_memory_exhausted(&self) -> bool;
pub fn is_request_unsupported(&self) -> bool;
pub fn description(&self) -> &str;
}
pub struct CannotReallocInPlace;
pub struct Excess(pub *mut u8, pub usize);
pub unsafe trait Alloc {
// required
unsafe fn alloc(&mut self, layout: Layout) -> Result<*mut u8, AllocErr>;
unsafe fn dealloc(&mut self, ptr: *mut u8, layout: Layout);
// provided
fn oom(&mut self, _: AllocErr) -> !;
fn usable_size(&self, layout: &Layout) -> (usize, usize);
unsafe fn realloc(&mut self,
ptr: *mut u8,
layout: Layout,
new_layout: Layout) -> Result<*mut u8, AllocErr>;
unsafe fn alloc_zeroed(&mut self, layout: Layout) -> Result<*mut u8, AllocErr>;
unsafe fn alloc_excess(&mut self, layout: Layout) -> Result<Excess, AllocErr>;
unsafe fn realloc_excess(&mut self,
ptr: *mut u8,
layout: Layout,
new_layout: Layout) -> Result<Excess, AllocErr>;
unsafe fn grow_in_place(&mut self,
ptr: *mut u8,
layout: Layout,
new_layout: Layout) -> Result<(), CannotReallocInPlace>;
unsafe fn shrink_in_place(&mut self,
ptr: *mut u8,
layout: Layout,
new_layout: Layout) -> Result<(), CannotReallocInPlace>;
// convenience
fn alloc_one<T>(&mut self) -> Result<Unique<T>, AllocErr>
where Self: Sized;
unsafe fn dealloc_one<T>(&mut self, ptr: Unique<T>)
where Self: Sized;
fn alloc_array<T>(&mut self, n: usize) -> Result<Unique<T>, AllocErr>
where Self: Sized;
unsafe fn realloc_array<T>(&mut self,
ptr: Unique<T>,
n_old: usize,
n_new: usize) -> Result<Unique<T>, AllocErr>
where Self: Sized;
unsafe fn dealloc_array<T>(&mut self, ptr: Unique<T>, n: usize) -> Result<(), AllocErr>
where Self: Sized;
}
/// The global default allocator
pub struct Heap;
impl Alloc for Heap {
// ...
}
impl<'a> Alloc for &'a Heap {
// ...
}
/// The "system" allocator
pub struct System;
impl Alloc for System {
// ...
}
impl<'a> Alloc for &'a System {
// ...
}
I unfortunately wasn't paying close enough attention to mention this in the RFC discussion, but I think that realloc_in_place
should be replaced by two functions, grow_in_place
and shrink_in_place
, for two reasons:
- I can't think of a single use case (short of implementing
realloc
orrealloc_in_place
) where it is unknown whether the size of the allocation is increasing or decreasing. Using more specialized methods makes it slightly more clear what is going on. - The code paths for growing and shrinking allocations tend to be radically different - growing involves testing whether adjacent blocks of memory are free and claiming them, while shrinking involves carving off properly sized subblocks and freeing them. While the cost of a branch inside
realloc_in_place
is quite small, usinggrow
andshrink
better captures the distinct tasks that an allocator needs to perform.
Note that these can be added backwards-compatibly next to realloc_in_place
, but this would constrain which functions would be by default implemented in terms of which others.
For consistency, realloc
would probably also want to be split into grow
and split
, but the only advantage to having an overloadable realloc
function that I know of is to be able to use mmap
's remap option, which does not have such a distinction.
Additionally, I think that the default implementations of realloc
and realloc_in_place
should be slightly adjusted - instead of checking against the usable_size
, realloc
should just first try to realloc_in_place
. In turn, realloc_in_place
should by default check against the usable size and return success in the case of a small change instead of universally returning failure.
This makes it easier to produce a high-performance implementation of realloc
: all that is required is improving realloc_in_place
. However, the default performance of realloc
does not suffer, as the check against the usable_size
is still performed.
Another issue: The doc for fn realloc_in_place
says that if it returns Ok, then one is assured that ptr
now "fits" new_layout
.
To me this implies that it must check that the alignment of the given address matches any constraint implied by new_layout
.
However, I don't think the spec for the underlying fn reallocate_inplace
function implies that it will perform any such check.
- Furthermore, it seems reasonable that any client diving into using
fn realloc_in_place
will themselves be ensuring that the alignments work (in practice I suspect it means that the same alignment is required everywhere for the given use case...)
So, should the implementation of fn realloc_in_place
really be burdened with checking that the alignment of the given ptr
is compatible with that of new_layout
? It is probably better in this case (of this one method) to push that requirement back to the caller...
@gereeter you make good points; I will add them to the check list I am accumulating in the issue description.
(at this point I am waiting for #[may_dangle]
support to ride the train into the beta
channel so that I will then be able to use it for std collections as part of allocator integration)
I'm new to Rust, so forgive me if this has been discussed elsewhere.
Is there any thought on how to support object-specific allocators? Some allocators such as slab allocators and magazine allocators are bound to a particular type, and do the work of constructing new objects, caching constructed objects which have been "freed" (rather than actually dropping them), returning already-constructed cached objects, and dropping objects before freeing the underlying memory to an underlying allocator when required.
Currently, this proposal doesn't include anything along the lines of ObjectAllocator<T>
, but it would be very helpful. In particular, I'm working on an implementation of a magazine allocator object-caching layer (link above), and while I can have this only wrap an Allocator
and do the work of constructing and dropping objects in the caching layer itself, it'd be great if I could also have this wrap other object allocators (like a slab allocator) and truly be a generic caching layer.
Where would an object allocator type or trait fit into this proposal? Would it be left for a future RFC? Something else?
I don't think this has been discussed yet.
You could write your own ObjectAllocator<T>
, and then do impl<T: Allocator, U> ObjectAllocator<U> for T { .. }
, so that every regular allocator can serve as an object-specific allocator for all objects.
Future work would be modifying collections to use your trait for their nodes, instead of plain ole' (generic) allocators directly.
(at this point I am waiting for #[may_dangle] support to ride the train into the beta channel so that I will then be able to use it for std collections as part of allocator integration)
I guess this has happened?
@Ericson2314 Yeah, writing my own is definitely an option for experimental purposes, but I think there'd be much more benefit to it being standardized in terms of interoperability (for example, I plan on also implementing a slab allocator, but it would be nice if a third-party user of my code could use somebody else's slab allocator with my magazine caching layer). My question is simply whether an ObjectAllocator<T>
trait or something like it is worth discussing. Although it seems that it might be best for a different RFC? I'm not terribly familiar with the guidelines for how much belongs in a single RFC and when things belong in separate RFCs...
Where would an object allocator type or trait fit into this proposal? Would it be left for a future RFC? Something else?
Yes, it would be another RFC.
I'm not terribly familiar with the guidelines for how much belongs in a single RFC and when things belong in separate RFCs...
that depends on the scope of the RFC itself, which is decided by the person who writes it, and then feedback is given by everyone.
But really, as this is a tracking issue for this already-accepted RFC, thinking about extensions and design changes isn't really for this thread; you should open a new one over on the RFCs repo.
@joshlf Ah, I thought ObjectAllocator<T>
was supposed to be a trait. I meant prototype the trait not a specific allocator. Yes that trait would merit its own RFC as @steveklabnik says.
@steveklabnik yeah now discussion would be better elsewhere. But @joshlf was also raising the issue lest it expose a hitherto unforeseen flaw in the accepted but unimplemented API design. In that sense it matches the earlier posts in this thread.
@Ericson2314 Yeah, I thought that was what you meant. I think we're on the same page :)
@steveklabnik Sounds good; I'll poke around with my own implementation and submit an RFC if it ends up seeming like a good idea.
@joshlf I don't any reason why custom allocators would go into the compiler or standard library. Once this RFC lands, you could easily publish your own crate that does an arbitrary sort of allocation (even a fully-fledged allocator like jemalloc could be custom-implemented!).
@alexreg This isn't about a particular custom allocator, but rather a trait that specifies the type of all allocators which are parametric on a particular type. So just like RFC 1398 defines a trait (Allocator
) that is the type of any low-level allocator, I'm asking about a trait (ObjectAllocator<T>
) that is the type of any allocator which can allocate/deallocate and construct/drop objects of type T
.
@alexreg See my early point about using standard library collections with custom object-specific allocators.
Sure, but I’m not sure that would belong in the standard library. Could easily go into another crate, with no loss of functionality or usability.
Yes but you probably want some standard library functionality to rely on it (such as what @Ericson2314 suggested).
I think you’d want to use standard-library collections (any heap-allocated value) with an arbitrary custom allocator; i.e. not limited to object-specific ones.
Ideally you'd want both - to accept either type of allocator. There are very significant benefits to using object-specific caching; for example, both slab allocation and magazine caching give very significant performance benefits - take a look at the papers I linked to above if you're curious.
But the object allocator trait could simply be a subtrait of the general allocator trait. It’s as simple as that, as far as I’m concerned. Sure, certain types of allocators can be more efficient than general-purpose allocators, but neither the compiler nor the standard really need to (or indeed should) know about this.
Ah, so the problem is that the semantics are different. Allocator
allocates and frees raw byte blobs. ObjectAllocator<T>
, on the other hand, would allocate already-constructed objects and would also be responsible for dropping these objects (including being able to cache constructed objects which could be handed out later in leu of constructing a newly-allocated object, which is expensive). The trait would look something like this:
trait ObjectAllocator<T> {
fn alloc() -> T;
fn free(t T);
}
This is not compatible with Allocator
, whose methods deal with raw pointers and have no notion of type. Additionally, with Allocator
s, it is the caller's responsibility to drop
the object being freed first. This is really important - knowing about the type T
allows ObjectAllocator<T>
to do things like call T
's drop
method, and since free(t)
moves t
into free
, the caller cannot drop t
first - it is instead the ObjectAllocator<T>
's responsibility. Fundamentally, these two traits are incompatible with one another.
@alexreg Ah yes, I was hoping so too :) Oh well - it'll have to wait for another RFC.
A crate for testing custom allocators would be useful.
Forgive me if I'm missing something obvious, but is there a reason for the Layout
trait described in this RFC not to implement Copy
as well as Clone
, since it's just POD?
I can't think of any.
Sorry to be bringing this up so late in the process, but...
Might it be worth adding support for a dealloc
-like function that isn't a method, but rather a function? The idea would be to use alignment to be able to infer from a pointer where in memory its parent allocator is and thus be able to free without needing an explicit allocator reference.
This could be a big win for data structures that use custom allocators. This would allow them to not keep a reference to the allocator itself, but rather only need to be parametric on the type of the allocator to be able to call the right dealloc
function. For example, if Box
is eventually modified to support custom allocators, then it'd be able to keep being only a single word (just the pointer) as opposed to having to be expanded to two words to store a reference to the allocator as well.
On a related note, it might also be useful to support a non-method alloc
function to allow for global allocators. This would compose nicely with a non-method dealloc
function - for global allocators, there'd be no need to do any kind of pointer-to-allocator inference since there'd only be a single static instance of the allocator for the whole program.
@joshlf The current design allows you to get that by just having your allocator be a (zero-sized) unit type - i.e. struct MyAlloc;
that you then implement the Allocator
trait on.
Storing references or nothing at all, always, is less general than storing the allocator by-value.
I could see that being true for a directly-embedded type, but what about if a data structure decies to keep a reference instead? Does a reference to a zero-sized type take up zero space? That is, if I have:
struct Foo()
struct Blah{
foo: &Foo,
}
Does Blah
have zero size?
Actually, even it's possible, you might not want your allocator to have zero size. For example, you might have an allocator with a non-zero size that you allocate from, but that has the ability to free objects w/o knowing about the original allocator. This would still be useful for making a Box
take only a word. You'd have something like Box::new_from_allocator
which would have to take an allocator as an argument - and it might be a nonzero-sized allocator - but if the allocator supported freeing without the original allocator reference, the returned Box<T>
could avoid storing a reference to the allocator that was passed in the original Box::new_from_allocator
call.
For example, you might have an allocator with a non-zero size that you allocate from, but that has the ability to free objects w/o knowing about the original allocator.
I recall long, long, ago proposing factoring out separate allocator and deallocator traits (with associate types connecting the two) for basically this reason.
Is/should the compiler be allowed to optimize away allocations with these allocators?
Is/should the compiler be allowed to optimize away allocations with these allocators?
@Zoxc What do you mean?
I recall long, long, ago proposing factoring out separate allocator and deallocator traits (with associate types connecting the two) for basically this reason.
For posterity, let me clarify this statement (I talked to @Ericson2314 about it offline): The idea is that a Box
could be parametric just on a deallocator. So you could have the following implementation:
trait Allocator {
type D: Deallocator;
fn get_deallocator(&self) -> Self::D;
}
trait Deallocator {}
struct Box<T, D: Deallocator> {
ptr: *mut T,
d: D,
}
impl<T, D: Deallocator> Box<T, D> {
fn new_from_allocator<A: Allocator>(x: T, a: A) -> Box<T, A::D> {
...
Box {
ptr: ptr,
d: a.get_deallocator()
}
}
}
This way, when calling new_from_allocator
, if A::D
is a zero-sized type, then the d
field of Box<T, A::D>
takes up zero size, and so the size of the resulting Box<T, A::D>
is a single word.
Is there a timeline for when this will land? I'm working on some allocator stuff, and it'd be nice if this stuff were there for me to build off of.
If there's interest, I'd be happy to lend some cycles to this, but I'm relatively new to Rust, so that might just create more work for the maintainers in terms of having to code review a newbie's code. I don't want to step on anybody's toes, and I don't want to make more work for people.
Ok we've recently met to evaluate the state of allocators and I think there's some good news for this as well! It looks like support has not yet landed in libstd for these APIs, but everyone's still happy with them landing at any time!
One thing we discussed is that changing over all the libstd types may be a bit premature due to possible inference issues, but regardless of that it seems like a good idea to land the Allocator
trait and the Layout
type in the proposed std::heap
module for experimentation elsewhere in the ecosystem!
@joshlf if you'd like to help out here I think that'd be more than welcome! The first piece will likely be landing the basic type/trait from this RFC into the standard library, and then from there we can start experimenting and playing around with collections in libstd as well.
@alexcrichton I think your link is broken? It points back here.
One thing we discussed is that changing over all the libstd types may be a bit premature due to possible inference issues
Adding the trait is a good first step, but without refactoring existing APIs to use it they won't see much usage. In #27336 (comment) I propose we can refactor the crates behind the facade immediately, but add newtype wrappers in std
. Annoying to do, but allows us to make progress.
@alexcrichton What would be the process for getting object allocators in? My experiments so far (soon to be public; I can add you to the private GH repo if you're curious) and the discussion here have led me to believe that there's going to be a near-perfect symmetry between the allocator traits and object allocator traits. E.g., you'll have something like (I changed Address
to *mut u8
for symmetry with *mut T
from ObjectAllocator<T>
; we'd probably end up with Address<T>
or something like that):
unsafe trait Allocator {
unsafe fn alloc(&mut self, layout: Layout) -> Result<*mut u8, AllocErr>;
unsafe fn dealloc(&mut self, ptr: *mut u8, layout: Layout);
}
unsafe trait ObjectAllocator<T> {
unsafe fn alloc(&mut self) -> Result<*mut T, AllocErr>;
unsafe fn dealloc(&mut self, ptr: *mut T);
}
Thus, I think experimenting with both allocators and object allocators at the same time could be useful. I'm not sure if this is the right place for that, though, or whether there should be another RFC or, at the very least, a separate PR.
It sounds like @alexcrichton wants a PR that provides the Allocator
trait and Layout
type, even if its not integrated into any collection in libstd
.
If I understand that correctly, then I can put up a PR for that. I had not done so because I keep trying to get at least integration with RawVec
and Vec
prototyped. (At this point I have RawVec
done, but Vec
is a bit more challenging due to the many other structures that build off of it, like Drain
and IntoIter
etc...)
actually, my current branch seems like it might actually build (and the one test for integration withRawVec
passed), so I went ahead and posted it: #42313
@hawkw asked:
Forgive me if I'm missing something obvious, but is there a reason for the Layout trait described in this RFC not to implement Copy as well as Clone, since it's just POD?
The reason I made Layout
only implement Clone
and not Copy
is that I wanted to leave open the possibility of adding more structure to the Layout
type. In particular, I still am interested in trying to have the Layout
attempt to track any type structure used to construct it (e.g. 16-array of struct { x: u8, y: [char; 215] }
), so that allocators would have the option of exposing instrumentation routines that report on what types their current contents are composes from.
This would almost certainly have to be an optional feature, i.e. it seems like the tide is firmly against forcing developers to to use the type-enriched Layout
constructors. so any instrumentation of this form would need to include something like an "unknown memory blocks" category to handle allocations that do not have the type information.
But nonetheless, features like this were the main reason why I did not opt to make Layout
implement Copy
; I basically figured that an implementation of Copy
would be a premature constraint on Layout
itself.
Looks like @pnkfelix has this one covered, and that PR is getting traction, so let's just go with that. I'm looking over it and making comments now, and it looks great!
Currently the signature for Allocator::oom
is:
fn oom(&mut self, _: AllocErr) -> ! {
unsafe { ::core::intrinsics::abort() }
}
It was brought to my attention, though, that Gecko at least likes to know the allocation size as well on OOM. We may wish to consider this when stabilizing to perhaps add context like Option<Layout>
for why OOM is happening.
@alexcrichton
Might it be worth having either multiple oom_xxx
variants or an enum of different argument types? There are a couple of different signatures for methods that could fail (e.g., alloc
takes a layout, realloc
takes a pointer, an original layout, and a new layout, etc), and there might be cases in which an oom
-like method would want to know about all of them.
@joshlf that's true, yes, but I'm not sure if it's useful. I wouldn't want to just add features because we can, they should continue to be well motivated.
A point for stabilization here is also "determine what the requirements are on the alignment provided to fn dealloc
", and the current implementation of dealloc
on Windows uses align
to determine how to correctly free. @ruuda you may be interested in this fact.
A point for stabilization here is also "determine what the requirements are on the alignment provided to
fn dealloc
", and the current implementation ofdealloc
on Windows usesalign
to determine how to correctly free.
Yes, I think this is how I initially ran into this; my program crashed on Windows because of this. As HeapAlloc
makes no alignment guarantees, allocate
allocates a bigger region and stores the original pointer in a header, but as an optimization this is avoided if the alignment requirements would be satisfied anyway. I wonder if there is a way to convert HeapAlloc
into an alignment-aware allocator that does not require alignment on free, without losing this optimization.
As
HeapAlloc
makes no alignment guarantees
It does provide a minimum alignment guarantee of 8 bytes for 32bit or 16 bytes for 64bit, it just doesn't provide any way to guarantee alignment higher than that.
The _aligned_malloc
provided by the CRT on Windows can provide allocations of higher alignment, but notably it must be paired with _aligned_free
, using free
is illegal. So if you don't know whether an allocation was done via malloc
or _aligned_malloc
then you're stuck in the same conundrum that alloc_system
is in on Windows if you don't know the alignment for deallocate
. The CRT does not provide the standard aligned_alloc
function which can be paired with free
, so even Microsoft hasn't been able to solve this problem. (Although it is a C11 function and Microsoft doesn't support C11 so that's a weak argument.)
Do note that deallocate
only cares about the alignment to know whether it is overaligned, the actual value itself is irrelevant. If you wanted a deallocate
that was truly alignment independent, you could simply treat all allocations as overaligned, but you'd waste a lot of memory on small allocations.
Currently the signature for
Allocator::oom
is:fn oom(&mut self, _: AllocErr) -> ! { unsafe { ::core::intrinsics::abort() } }It was brought to my attention, though, that Gecko at least likes to know the allocation size as well on OOM. We may wish to consider this when stabilizing to perhaps add context like
Option<Layout>
for why OOM is happening.
The AllocErr
already carries the Layout
in the AllocErr::Exhausted
variant. We could just add the Layout
to the AllocErr::Unsupported
variant as well, which I think would be simplest in terms of client expectations. (It does have the drawback of silghtly increasing the side of the AllocErr
enum itself, but maybe we shouldn't worry about that...)
Oh I suspect that's all that's needed, thanks for the correction @pnkfelix!
I'm going to start repurposing this issue for the tracking issue for std::heap
in general as it will be after #42727 lands. I'll be closing a few other related issues in favor of this.
Is there a tracking issue for converting collections? Now that the PRs are merged, I'd like to
- Discuss the associated error type
- Discuss converting collections to use any local allocator, (especially leveraging associated error type)
I've opened #42774 to track integration of Alloc
into std collections. With historical discussion in the libs team that's likely to be on a separate track of stabilization than an initial pass of the std::heap
module.
While reviewing allocator-related issues I also came across #30170 which @pnkfelix awhile back. It looks like the OSX system allocator is buggy with high alignments and when running that program with jemalloc it's segfaulting during deallocation on Linux at least. Worth considering during stabilization!
I've opened #42794 as a place to discuss the specific question of whether zero-sized allocations need to match their requested alignment.
(oh wait, zero-sized allocations are illegal in user allocators!)
Since the alloc::heap::allocate
function and friends are now gone in Nightly, I’ve updated Servo to use this new API. This is part of the diff:
-use alloc::heap;
+use alloc::allocator::{Alloc, Layout};
+use alloc::heap::Heap;
- let ptr = heap::allocate(req_size as usize, FT_ALIGNMENT) as *mut c_void;
+ let layout = Layout::from_size_align(req_size as usize, FT_ALIGNMENT).unwrap();
+ let ptr = Heap.alloc(layout).unwrap() as *mut c_void;
I feel the ergonomics are not great. We went from importing one item to importing three from two different modules.
- Would it make sense to have a convenience method for
allocator.alloc(Layout::from_size_align(…))
? - Would it make sense to make
<Heap as Alloc>::_
methods available as free functions or inherent methods? (To have one fewer item to import, theAlloc
trait.)
Alternatively, could the Alloc
trait be in the prelude or is it too niche of a use case?
@SimonSapin IMO there isn't much of a point in optimizing the ergonomics of such a low-level API.
I feel the ergonomics are not great. We went from importing one item to importing three from two different modules.
I had the exact same feeling with my codebase - it's pretty clunky now.
Would it make sense to have a convenience method for
allocator.alloc(Layout::from_size_align(…))?
Do you mean in the Alloc
trait, or just for Heap
? One thing to consider here is that there's now a third error condition: Layout::from_size_align
returns an Option
, so it could return None
in addition to the normal errors you can get when allocating.
Alternatively, could the
Alloc
trait be in the prelude or is it too niche of a use case?
IMO there isn't much of a point in optimizing the ergonomics of such a low-level API.
I agree that it's probably too low-level to put in the prelude, but I still think that there's value in optimizing the ergonomics (selfishly, at least - that was a really annoying refactor 😝 ).
@SimonSapin did you not handle OOM before? Also in std
all three types are available in the std::heap
module (they're supposed to be in one module). Also did you not handle the case of overflowing sizes before? Or zero-sized types?
did you not handle OOM before?
When it existed the alloc::heap::allocate
function returned a pointer without a Result
and did not leave a choice in OOM handling. I think it aborted the process. Now I’ve added .unwrap()
to panic the thread.
they're supposed to be in one module
I see now that heap.rs
contains pub use allocator::*;
. But when I clicked Alloc
in the impl listed on the rustdoc page for Heap
I was sent to alloc::allocator::Alloc
.
As to the rest, I haven’t looked into it. I’m porting to a new compiler a big pile of code that was written years ago. I think these are callbacks for FreeType, a C library.
When it existed the alloc::heap::allocate function returned a pointer without a Result and did not leave a choice in OOM handling.
It did give you a choice. The pointer it returned could have been a null pointer which would indicate the heap allocator failed to allocate. This is why I'm so glad it switched to Result
so people don't forget to handle that case.
Oh well, maybe the FreeType ended up doing a null check, I don’t know. Anyway, yes, returning a Result is good.
Given #30170 and #43097, I am tempted to resolve the OS X issue with ridiculously big alignments by simply specifying that users cannot request alignments >= 1 << 32
.
One very easy way to enforce this: Change the Layout
interface so that align
is denoted by a u32
instead of a usize
.
@alexcrichton do you have thoughts on this? Should I just make a PR that does this?
@pnkfelix Layout::from_size_align
would still take usize
and return an error on u32
overflow, right?
@SimonSapin what reason is there to have it continue taking usize
align, if a static precondition is that it is unsafe to pass a value >= 1 << 32
?
and if the answer is "well some allocators might support an alignment >= 1 << 32
", then we're back to the status quo and you can disregard my suggestion. The point of my suggestion is basically a "+1" to comments like this one
Because std::mem::align_of
returns usize
@SimonSapin ah, good old stable API's... sigh.
@pnkfelix limiting to 1 << 32
seems reasonable to me!
@rfcbot fcp merge
Ok this trait and its types have bake for awhile now and also been the underlying implementation of the standard collections since its inception. I would propose starting out with a particularly conservative initial offering, namely only stabilizing the following interface:
pub struct Layout { /* ... */ }
extern {
pub type void;
}
impl Layout {
pub fn from_size_align(size: usize, align: usize) -> Option<Layout>;
pub unsafe fn from_size_align_unchecked(size: usize, align: usize) -> Layout;
pub fn size(&self) -> usize;
pub fn align(&self) -> usize;
}
pub unsafe trait Alloc {
unsafe fn alloc(&mut self, layout: Layout) -> *mut void;
unsafe fn alloc_zeroed(&mut self, layout: Layout) -> *mut void;
unsafe fn dealloc(&mut self, ptr: *mut void, layout: Layout);
// all other methods are default and unstable
}
/// The global default allocator
pub struct Heap;
impl Alloc for Heap {
// ...
}
impl<'a> Alloc for &'a Heap {
// ...
}
/// The "system" allocator
pub struct System;
impl Alloc for System {
// ...
}
impl<'a> Alloc for &'a System {
// ...
}
Original proposal
pub struct Layout { /* ... */ }
impl Layout {
pub fn from_size_align(size: usize, align: usize) -> Option<Layout>;
pub unsafe fn from_size_align_unchecked(size: usize, align: usize) -> Layout;
pub fn size(&self) -> usize;
pub fn align(&self) -> usize;
}
// renamed from AllocErr today
pub struct Error {
// ...
}
impl Error {
pub fn oom() -> Self;
}
pub unsafe trait Alloc {
unsafe fn alloc(&mut self, layout: Layout) -> Result<*mut u8, Error>;
unsafe fn dealloc(&mut self, ptr: *mut u8, layout: Layout);
// all other methods are default and unstable
}
/// The global default allocator
pub struct Heap;
impl Alloc for Heap {
// ...
}
impl<'a> Alloc for &'a Heap {
// ...
}
/// The "system" allocator
pub struct System;
impl Alloc for System {
// ...
}
impl<'a> Alloc for &'a System {
// ...
}
Notably:
- Only stabilizing
alloc
,alloc_zeroed
, anddealloc
methods on theAlloc
trait for now. I think this solves the most pressing issue we have today, defining a custom global allocator. - Remove the
Error
type in favor of just using raw pointers. - Change the
u8
type in the interface tovoid
- A stripped down version of the
Layout
type.
There are still open questions such as what to do with dealloc
and alignment (precise alignment? fits? unsure?), but I'm hoping we can resolve them during FCP as it likely won't be an API-breaking change.
+1 to getting something stabilized!
Renames
AllocErr
toError
and moves the interface to be a bit more conservative.
Does this eliminate the option for allocators to specify Unsupported
? At the risk of harping more on something I've been harping on a lot, I think that #44557 is still an issue.
Layout
It looks like you've removed some of the methods from Layout
. Did you mean to have the ones you left out actually removed, or just left as unstable?
impl Error { pub fn oom() -> Self; }
Is this a constructor for what’s today AllocErr::Exhausted
? If so, shouldn’t it have a Layout
parameter?
Team member @alexcrichton has proposed to merge this. The next step is review by the rest of the tagged teams:
- @BurntSushi
- @Kimundi
- @alexcrichton
- @aturon
- @cramertj
- @dtolnay
- @eddyb
- @nikomatsakis
- @nrc
- @pnkfelix
- @sfackler
- @withoutboats
Concerns:
*mut u8resolved by #32838 (comment)
Once these reviewers reach consensus, this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!
See this document for info about what commands tagged team members can give me.
I'm really excited about getting to stabilize some of this work!
One question: in the above thread, @joshlf and @Ericson2314 raised an interesting point about the possibility of separating the Alloc
and Dealloc
traits in order to optimize for cases in which alloc
requires some data, but dealloc
requires no extra information, so the Dealloc
type can be zero-sized.
Was this question ever resolved? What are the disadvantages of separating the two traits?
Does this eliminate the option for allocators to specify Unsupported?
Yes and no, it would mean that such an operation is not supported in stable rust immediately, but we could continue to support it in unstable Rust.
Did you mean to have the ones you left out actually removed, or just left as unstable?
Indeed! Again though I'm just proposing a stable API surface area, we can leave all the other methods as unstable. Over time we can continue to stabilize more of the functionality. I think it's best to start as conservative as we can.
Is this a constructor for what’s today AllocErr::Exhausted? If so, shouldn’t it have a Layout parameter?
Aha good point! I sort of wanted to leave the possibility though to make Error
a zero-sized type if we really needed it, but we can of course keep the layout-taking methods in unstable and stabilize them if necessary. Or do you think that the layout-preserving Error
should be stabilized in the first pass?
I hadn't personally seen such a question/concern yet (I think I missed it!), but I wouldn't personally see it as being worth it. Two traits is twice the boilerplate in general, as now everyone would have to type Alloc + Dealloc
in collections for example. I would expect that such a specialized use would not want to inform the interface all other users end up using, personally.
I hadn't personally seen such a question/concern yet (I think I missed it!), but I wouldn't personally see it as being worth it.
In general I agree that it's not worth it with one glaring exception: Box
. Box<T, A: Alloc>
would, given the current definition of Alloc
, have to be at least two words large (the pointer that it already has and a reference to an Alloc
at the very least) except in the case of global singletons (which can be implemented as ZSTs). A 2x (or more) blowup in the space required to store such a common and fundamental type is concerning to me.
as now everyone would have to type Alloc + Dealloc in collections for example
We could add something like this:
trait Allocator: Alloc + Dealloc {}
impl<T> Allocator for T where T: Alloc + Dealloc {}
A 2x (or more) blowup in the space required to store such a common and fundamental type
Only when you use a custom allocator that is not process-global. std::heap::Heap
(the default) is zero-size.
Or do you think that the layout-preserving Error should be stabilized in the first pass?
@alexcrichton I don’t really understand why this proposed first pass is at it is at all. There’s barely more than could already be done by abusing Vec
, and not enough for example to use https://crates.io/crates/jemallocator.
What still needs to be resolved to stabilize the whole thing?
Only when you use a custom allocator that is not process-global. std::heap::Heap (the default) is zero-size.
That seems like the primary use case of having parametric allocators, no? Imagine the following simple definition of a tree:
struct Node<T, A: Alloc> {
t: T,
left: Option<Box<Node<T, A>>>,
right: Option<Box<Node<T, A>>>,
}
A tree constructed from those with a 1-word Alloc
would have a ~1.7x blowup in size for the whole data structure compared to a ZST Alloc
. That seems pretty bad to me, and these kinds of applications are sort of the whole point of having Alloc
be a trait.
@glaebhoerl Yeah, but stabilization still seems a ways off since there isn't an implementation yet. If we disable manual impls of Allocator
I think we can switch to trait-aliases backwards-compatibly when they arrive ;)
A 2x (or more) blowup in the space required to store such a common and fundamental type is concerning to me.
I'd imagine all implementations today are just a zero-size type or a pointer large, right? Isn't the possible optimization that some pointer-size-types can be zero sized? (or something like that?)
We could add something like this:
Indeed! We've then taken one trait to three though. In the past we've never had a great experience with such traits. For example Box<Both>
does not cast to Box<OnlyOneTrait>
. I'm sure we could wait for language features to smooth all this out, but it seems like those are a long way off, at best.
What still needs to be resolved to stabilize the whole thing?
I don't know. I wanted to start with the absolute smallest thing so there would be less debate.
I'd imagine all implementations today are just a zero-size type or a pointer large, right? Isn't the possible optimization that some pointer-size-types can be zero sized? (or something like that?)
Yeah, the idea is that, given a pointer to an object allocated from your your type of allocator, you can figure out which instance it came from (e.g., using inline metadata). Thus, the only information you need to deallocate is type information, not runtime information.
To come back to alignment on deallocate, I see two ways forward:
-
Stabilize as proposed (with alignment on deallocate). Giving away ownership of manually allocated memory would be impossible unless the
Layout
is included. In particular, it is impossible to build aVec
orBox
orString
or otherstd
container with a stricter alignment than required (for instance because you don't want the boxed element to straddle a cache line), without deconstructing and deallocating it manually later (which is not always an option). Another example of something that would be impossible, is filling aVec
using simd operations, and then giving it away. -
Do not require alignment on deallocate, and remove the small-allocation optimization from Windows’
HeapAlloc
-basedalloc_system
. Always store the alignment. @alexcrichton, as you committed that code, do you remember why it was put there in the first place? Do we have any evidence that it saves a significant amount of memory for real-world applications? (With microbenchmarks it is possible to make the results come out either way depending the allocation size — unlessHeapAlloc
was rounding up sizes anyway.)
In any case, this is a very difficult trade off to make; the memory and performance impact will depend highly on the type of application, and which one to optimize for is application-specific as well.
I think we may actually be Just Fine (TM). Quoting the Alloc
docs:
Some of the methods require that a layout fit a memory block.
What it means for a layout to "fit" a memory block means (or
equivalently, for a memory block to "fit" a layout) is that the
following two conditions must hold:
The block's starting address must be aligned to
layout.align()
.The block's size must fall in the range
[use_min, use_max]
, where:
use_min
isself.usable_size(layout).0
, and
use_max
is the capacity that was (or would have been)
returned when (if) the block was allocated via a call to
alloc_excess
orrealloc_excess
.Note that:
the size of the layout most recently used to allocate the block
is guaranteed to be in the range[use_min, use_max]
, anda lower-bound on
use_max
can be safely approximated by a call to
usable_size
.if a layout
k
fits a memory block (denoted byptr
)
currently allocated via an allocatora
, then it is legal to
use that layout to deallocate it, i.e.a.dealloc(ptr, k);
.
Note that last bullet. If I allocate with a layout with alignment a
, then it should be legal for me to deallocate with alignment b < a
because an object which is aligned to a
is also aligned to b
, and thus a layout with alignment b
fits an object allocated with a layout with alignment a
(and with the same size).
What this means is that you should be able to allocate with an alignment which is greater than the minimum alignment required for a particular type and then allow some other code to deallocate with the minimum alignment, and it should work.
Isn't the possible optimization that some pointer-size-types can be zero sized? (or something like that?)
There was an RFC for this recently and it seems very unlikely that it could be done due to compatibility concerns: rust-lang/rfcs#2040
For example
Box<Both>
does not cast toBox<OnlyOneTrait>
. I'm sure we could wait for language features to smooth all this out, but it seems like those are a long way off, at best.
Trait object upcasting on the other hand seems uncontroversially desirable, and mostly a question of effort / bandwidth / willpower to get it implemented. There was a thread recently: https://internals.rust-lang.org/t/trait-upcasting/5970
@ruuda I was the one who wrote that alloc_system
implementation originally. alexcrichton merely moved it around during the great allocator refactors of <time period>
.
The current implementation requires that you deallocate with the same alignment specified that you allocated a given memory block with. Regardless of what the documentation may claim, this is the current reality that everyone must abide by until alloc_system
on Windows is changed.
Allocations on Windows always use a multiple of MEMORY_ALLOCATION_ALIGNMENT
(although they remember the size you allocated them with to the byte). MEMORY_ALLOCATION_ALIGNMENT
is 8 on 32bit and 16 on 64bit. For overaligned types, because the alignment is greater than MEMORY_ALLOCATION_ALIGNMENT
, the overhead caused by alloc_system
is consistently the amount of alignment specified, so a 64byte aligned allocation would have 64 bytes of overhead.
If we decided to extend that overaligned trick to all allocations (which would get rid of the requirement to deallocate with the same alignment that you specified when allocating), then more allocations would have overhead. Allocations whose alignments are identical to MEMORY_ALLOCATION_ALIGNMENT
will suffer a constant overhead of MEMORY_ALLOCATION_ALIGNMENT
bytes. Allocations whose alignments are less than MEMORY_ALLOCATION_ALIGNMENT
will suffer an overhead of MEMORY_ALLOCATION_ALIGNMENT
bytes approximately half of the time. If the size of the allocation rounded up to MEMORY_ALLOCATION_ALIGNMENT
is greater than or equal to the size of the allocation plus the size of a pointer, then there is no overhead, otherwise there is. Considering 99.99% of allocations will not be overaligned, do you really want to incur that sort of overhead on all those allocations?
I personally feel that the implementation of alloc_system
today on Windows is a bigger benefit than having the ability to relinquish ownership of an allocation to another container like Vec
. AFAIK though there's no data to measure the impact of always padding with the alignment and not requiring an alignment on deallocation.
I think that comment is wrong, as pointed out alloc_system
on Windows relies on the same alignment being passed to deallocation as was passed on allocation.
Considering 99.99% of allocations will not be overaligned, do you really want to incur that sort of overhead on all those allocations?
It depends on the application whether the overhead is significant, and whether to optimize for memory or performance. My suspicion is that for most applications either is fine, but a small minority cares deeply about memory, and they really cannot afford those extra bytes. And another small minority needs control over alignment, and they really need it.
I think that comment is wrong, as pointed out
alloc_system
on Windows relies on the same alignment being passed to deallocation as was passed on allocation.
Doesn't that imply that alloc_system
on Windows doesn't actually properly implement the Alloc
trait (and thus maybe we ought to change the requirements of the Alloc
trait)?
If I'm reading your comment correctly, isn't that alignment overhead present for all allocations regardless of whether we need to be able to deallocate with a different alignment? That is, if I allocate 64 bytes with 64 byte alignment and I also deallocate with 64 byte alignment, the overhead you described is still present. Thus, it's not a feature of being able to deallocate with different alignments so much as it is a feature of requesting larger-than-normal alignments.
@joshlf The overhead caused by alloc_system
currently is due to requesting larger-than-normal alignments. If your alignment is less than or equal to MEMORY_ALLOCATION_ALIGNMENT
, then there is no overhead caused by alloc_system
.
However, if we changed the implementation to allow deallocating with different alignments, then the overhead would apply to nearly all allocations, regardless of alignment.
Ah I see; makes sense.
What is the meaning of implementing Alloc for both Heap and &Heap? In what cases would the user use one of those impls vs the other?
Is this the first standard library API in which *mut u8
would mean "pointer to whatever"? There is String::from_raw_parts but that one really does mean pointer to bytes. I am not a fan of *mut u8
meaning "pointer to whatever" -- even C does better. What are some other options? Maybe a pointer to opaque type would be more meaningful.
@rfcbot concern *mut u8
@dtolnay Alloc for Heap
is sort of "standard" and Alloc for &Heap
is like Write for &T
where the trait requires &mut self
but the implementation does not. Notably that means that types like Heap
and System
are threadsafe and do not need to be synchronized when allocating.
More importantly, though, usage of #[global_allocator]
requires that the static it's attached to , which has type T
, to have Alloc for &T
. (aka all global allocators must be threadsafe)
For *mut u8
I think that *mut ()
may be interesting, but I don't personally feel too compelled to "get this right" here per se.
The main advantage of *mut u8
is that it is very convenient to use .offset
with byte offsets.
For
*mut u8
I think that*mut ()
may be interesting, but I don't personally feel too compelled to "get this right" here per se.
If we go with *mut u8
in a stable interface, aren't we locking ourselves in? In other words, once we stabilize this, we won't have a chance to "get this right" in the future.
Also, *mut ()
seems a bit dangerous to me in case we ever make an optimization like RFC 2040 in the future.
The main advantage of
*mut u8
is that it is very convenient to use .offset with byte offsets.
True, but you could easily do let ptr = (foo as *mut u8)
and then go about your merry way. That doesn't seem like enough of a motivation to stick with *mut u8
in the API if there are compelling alternatives (which, to be fair, I'm not sure there are).
Also, *mut () seems a bit dangerous to me in case we ever make an optimization like RFC 2040 in the future.
That optimization will already probably never happen-- it'd break too much existing code. Even if it did, it would be applied to &()
and &mut ()
, not *mut ()
.