Add pooling for heap-allocated objects
hawkw opened this issue · 8 comments
some notes on object pooling
requirements
the main use-case for adding pooling is to improve tracing-subscriber
's Registry
performance
by reusing the hashmaps that are used to store user-defined extensions. since sharded_slab
is already used for storing per-span data, the ideal solution would be to store those hashmaps in the slab slot for each span, rather than introducing an additional crate for object pooling.
goals
-
clear pooled data in place: a number of standard library types provide a
clear
method —String::clear
,Vec::clear
, etc. — that removes all data from the type but retains any allocated capacity. we want to use that for clearing the pooled data. i imagine we would want to add a trait along the lines of this playground and require that pooled data implement it. -
continue providing the current slab APIs without pooling: we should continue to provide the existing API without adding pooled data. we might be able to do this by adding a type parameter for the pooled data to
Slab
and having it to default to()
, and then only exposing methods to interact with pooling when that type parameter is something that implements the trait for pooled data. this may not be possible, in which case I think we would want to provide a separatePool
API type that's like aSlab
but with added pooled data. -
reuse/share existing code: i think we ought to be able to avoid writing (& maintaining!) two separate versions of the
Shard
,Page
, andSlot
types for slabs with and without pooling, if at all possible.
non-goals
-
fancy management of lifetimes for pooled data: some object pools implement things like ref-counting for the pooled objects, so that when you check something out of the pool, you can clone references to it, pass them around, and then check it back in when those references are dropped. we don't need to do this — our pooled objects are checked out when their slot is
insert
ed into, and checked back in when their slot isremove
d. it would be fairly straightforward to implement ref counting in a thin layer on top of that API, buttracing-subscriber
's use-case doesn't need it. if other people are interested in a ref-counted pool usingsharded-slab
, we could add a ref-counting API later, or even make a separate crate. -
managing mutability of pooled data: pooled objects will only be accessed as shared references. if users want to mutate them (which they probably will), they can bring their own strategy for managing mutability. if we implement the pooled object trait (which i called
Clear
in the playground i posted above) forMutex<T> where T: Clear
andRwLock<T> where T: Clear
, users can bring their own locking strategy depending on the use-case. the only time we will mutate a pooled object is when we actually clear it, which we can do after a slot has been released, so it will be safe to mutate. -
running
Drop
impls for pooled types: we don't want to drop pooled objects in place without actually deallocating them. most of the types we want to pool are going to just be for managing allocations, not anything fancy like connection pooling, and the only drop glue we expect is ensuring that the elements of a pooled collection (e.g.Vec
/HashMap
) are dropped when we clear the pooled collection they're in. this should happen automatically. -
releasing memory: for a first pass, we should punt on deallocating pooled objects or shrinking their allocations. the point of the object pool is to reuse preexisting allocations, so we don't want to do this too often. eventually, we'll want to add the ability to place bounds on the slab's memory use, including pooled objects, but that will be a separate feature.
implementation notes
a back-of-the-napkin description of how i expect this will work.
first, we add a trait for data that can be pooled. i proposed this playground in #2 (comment), and i imagine something close to that should work — feel free to tweak it as necessary.
each slot in a slab page is represented by the Slot
struct:
Lines 9 to 16 in 12306dc
we will want to add an additional type parameter to
Slot
for a pooled object. the pooled data should live in a CausalCell
so loom can enforce that it is not concurrently mutated; we could stick it in the item
causal cell, or add a new one. i imagine something like this:
pub(crate) struct Slot<T, C, P = ()> {
lifecycle: AtomicUsize,
/// The offset of the next item on the free list.
next: CausalCell<usize>,
/// The data stored in the slot.
item: CausalCell<(Option<T>, P)>,
_cfg: PhantomData<fn(C)>,
}
when a slot is created, we'll create a new pooled object as well. in theory we could allow customizing the function that constructs the new pooled object, but that's another thing to thread through the whole API, so i think it's okay to just require the pooled objects to implement Default
.
when we "remove" a slot, we would want to call the pooled object's Clear
method, as well as emptying the Option
; this would happen around here:
Line 319 in 12306dc
then, we need to thread access to the pooled data through Page
, Shard
, up to the public API. this is probably not hard, just annoying. then, we write docs for the APIs, and add tests, and we're done!
in re: my previous comment, it just occurs to me that there is a much simpler solution to providing both the Slab
API as it exists currently and adding a pooling API. rather than having a concept of "per slot pooled data" and "non-pooled slot data", we could just implement the Clear
trait for Option<T>
, like this:
impl<T> Clear for Option<T> {
fn clear(&mut self) {
let _ = self.take();
}
}
then, we can more or less leave the existing internal implementations the same, if we add T: Clear
bounds on Shard
, Page
, and Slot
, and we can implement separate Slab<T>
and Pool<T>
API types like:
pub struct Slab<T, C: cfg::Config = DefaultConfig> {
shards: Box<[Shard<Option<T>, C>]>,
_cfg: PhantomData<C>,
}
// ...
pub struct Pool<T: Clear + Default, C: cfg::Config = DefaultConfig> {
shards: Box<[Shard<T, C>]>,
_cfg: PhantomData<C>,
}
for the most part, they could expose similar APIs, but I think the Slab::take
method
Lines 410 to 420 in 12306dc
only makes sense for a
Slab
; the Pool
can't really implement this if we want to be able to guarantee that we'll retain allocations in place. The internal Shard
/Page
/Slot
types can expose their take
methods in separate impl blocks where the T
type parameter is Option<T>
.@bIgBV since you were interested in working on this, i've left some comments explaining what i have in mind — hopefully, that's enough to get you started! please let me know if you have any more questions!
Okay, from reading your notes, I gather that the object pool is separate from the slab and works in conjunction with the slab. It's primarily for heap allocated objects and to help ensure that allocations for the heap allocated objects are reused. This means that when you want to instantiate a heap allocated object, you need to do so through the slab, so that it can keep track of the allocations, correct?
The question now becomes how does work with existing APIs. Slab::insert
currently takes ownership of T
and Slab::take
returns the ownership back, so the caller is responsible for managing heap allocated object.
From what I can see, object pool APIs have you checking out an object, mutating it and using it before checking the object back in. But this usecase is completely different from how the slab works where we insert a value to be shared across multiple threads. I think we could provide an Entry
like API where the callee first checks out an entry and modifies it in place. Now this entry's memory is managed by the pool and it's available across threads as it's backed by the Slab
EDIT: I wrote this up before your additional comments outlining a design and implementation. It looks like the design answers the questions I had.
Okay, from reading your notes, I gather that the object pool is separate from the slab and works in conjunction with the slab. It's primarily for heap allocated objects and to help ensure that allocations for the heap allocated objects are reused. This means that when you want to instantiate a heap allocated object, you need to do so through the slab, so that it can keep track of the allocations, correct?
yup, that's right. i was initially wondering about providing both a storage type that's moved in and out and a pooled allocation in the same Slab
type, but i think that's too much effort — users can always implement it with something like this:
struct MyPooledThing {
heap_data: Vec<u8> // just using this as an example of a pooled heap allocation...
small_struct: Option<SomeSmallStruct>,
}
impl Clear for MyPooledThing {
fn clear(&mut self) {
self.heap_data.clear();
self.small_struct = None;
}
}
The question now becomes how does work with existing APIs.
Slab::insert
currently takes ownership ofT
andSlab::take
returns the ownership back, so the caller is responsible for managing heap allocated object.
yeah, that's right — as i mentioned in #2 (comment), i don't think take
makes sense for a pool. it could be nice to have an insert
API where you get mutable access to the slot before it's made available for other threads to reference, but if we want to punt on that, i think it would also be fine to have a create
method or something that just returns an index, and doesn't move anything in; you would then lock the pooled data to modify it.
a way to access the slot mutably would be nicer, though; the issue is just that we would need to make sure any mutation is done before it's made visible to other threads through Pool::get
...we could probably take a FnOnce(&mut T)
and run it before giving back the slot's index, or have another RAII guard type...
a way to access the slot mutably would be nicer, though; the issue is just that we would need to make sure any mutation is done before it's made visible to other threads through Pool::get...we could probably take a FnOnce(&mut T) and run it before giving back the slot's index, or have another RAII guard type...
I think taking a FnOnce(&mut T)
an running it in the block when a slot's contents are being re-written should theoretically work. But I think that would be additional work on top of the basic pooling APIs.
I've been thinking about a path forward for this. One idea I have is to add a new Slot::get_used_block
method which returns a previously allocated and used slot without writing to it. It just sets up all the required state for a slot which has references to it and returns the slot.
We could possibly extract the following snippet into it's own private method.
Lines 162 to 164 in 12306dc
This method will only be exposed when T: Clear + Default
so only the Pool
type will have access to it.
We can now expose a new method in Shard Shard::get_allocated
which calls Slot::get_used_block
if the current page is allocated, if it isn't we return None
and the calling function will insert a new T::default
which will follow the Pool::create
code path.
One concern is that we might have to change Shard::allocate
to allow the allocation with an empty T
in the slot, instead of just an empty slot.