Add pooling for heap-allocated objects

some notes on object pooling

requirements

the main use-case for adding pooling is to improve tracing-subscriber's Registry performance
by reusing the hashmaps that are used to store user-defined extensions. since sharded_slab is already used for storing per-span data, the ideal solution would be to store those hashmaps in the slab slot for each span, rather than introducing an additional crate for object pooling.

goals

clear pooled data in place: a number of standard library types provide a clear method — String::clear, Vec::clear, etc. — that removes all data from the type but retains any allocated capacity. we want to use that for clearing the pooled data. i imagine we would want to add a trait along the lines of this playground and require that pooled data implement it.
continue providing the current slab APIs without pooling: we should continue to provide the existing API without adding pooled data. we might be able to do this by adding a type parameter for the pooled data to Slab and having it to default to (), and then only exposing methods to interact with pooling when that type parameter is something that implements the trait for pooled data. this may not be possible, in which case I think we would want to provide a separate Pool API type that's like a Slab but with added pooled data.
reuse/share existing code: i think we ought to be able to avoid writing (& maintaining!) two separate versions of the Shard, Page, and Slot types for slabs with and without pooling, if at all possible.

non-goals

fancy management of lifetimes for pooled data: some object pools implement things like ref-counting for the pooled objects, so that when you check something out of the pool, you can clone references to it, pass them around, and then check it back in when those references are dropped. we don't need to do this — our pooled objects are checked out when their slot is inserted into, and checked back in when their slot is removed. it would be fairly straightforward to implement ref counting in a thin layer on top of that API, but tracing-subscriber's use-case doesn't need it. if other people are interested in a ref-counted pool using sharded-slab, we could add a ref-counting API later, or even make a separate crate.
managing mutability of pooled data: pooled objects will only be accessed as shared references. if users want to mutate them (which they probably will), they can bring their own strategy for managing mutability. if we implement the pooled object trait (which i called Clear in the playground i posted above) for Mutex<T> where T: Clear and RwLock<T> where T: Clear, users can bring their own locking strategy depending on the use-case. the only time we will mutate a pooled object is when we actually clear it, which we can do after a slot has been released, so it will be safe to mutate.
running Drop impls for pooled types: we don't want to drop pooled objects in place without actually deallocating them. most of the types we want to pool are going to just be for managing allocations, not anything fancy like connection pooling, and the only drop glue we expect is ensuring that the elements of a pooled collection (e.g. Vec/HashMap) are dropped when we clear the pooled collection they're in. this should happen automatically.
releasing memory: for a first pass, we should punt on deallocating pooled objects or shrinking their allocations. the point of the object pool is to reuse preexisting allocations, so we don't want to do this too often. eventually, we'll want to add the ability to place bounds on the slab's memory use, including pooled objects, but that will be a separate feature.

implementation notes

a back-of-the-napkin description of how i expect this will work.

first, we add a trait for data that can be pooled. i proposed this playground in #2 (comment), and i imagine something close to that should work — feel free to tweak it as necessary.

each slot in a slab page is represented by the Slot struct:

sharded-slab/src/page/slot.rs

Lines 9 to 16 in 12306dc

    
           pub(crate) struct Slot<T, C> { 
        
               lifecycle: AtomicUsize, 
        
               /// The offset of the next item on the free list. 
        
               next: CausalCell<usize>, 
        
               /// The data stored in the slot. 
        
               item: CausalCell<Option<T>>, 
        
               _cfg: PhantomData<fn(C)>, 
        
           }

we will want to add an additional type parameter to Slot for a pooled object. the pooled data should live in a CausalCell so loom can enforce that it is not concurrently mutated; we could stick it in the item causal cell, or add a new one. i imagine something like this:

pub(crate) struct Slot<T, C, P = ()> {
    lifecycle: AtomicUsize,
    /// The offset of the next item on the free list.
    next: CausalCell<usize>,
    /// The data stored in the slot.
    item: CausalCell<(Option<T>, P)>,
    _cfg: PhantomData<fn(C)>,
}

when a slot is created, we'll create a new pooled object as well. in theory we could allow customizing the function that constructs the new pooled object, but that's another thing to thread through the whole API, so i think it's okay to just require the pooled objects to implement Default.

when we "remove" a slot, we would want to call the pooled object's Clear method, as well as emptying the Option; this would happen around here:

sharded-slab/src/page/slot.rs

Line 319 in 12306dc

let item = self.item.with_mut(|item| unsafe { (*item).take() });

then, we need to thread access to the pooled data through Page, Shard, up to the public API. this is probably not hard, just annoying. then, we write docs for the APIs, and add tests, and we're done!

in re: my previous comment, it just occurs to me that there is a much simpler solution to providing both the Slab API as it exists currently and adding a pooling API. rather than having a concept of "per slot pooled data" and "non-pooled slot data", we could just implement the Clear trait for Option<T>, like this:

impl<T> Clear for Option<T> {
    fn clear(&mut self) {
         let _ = self.take();
    }
}

then, we can more or less leave the existing internal implementations the same, if we add T: Clear bounds on Shard, Page, and Slot, and we can implement separate Slab<T> and Pool<T> API types like:

pub struct Slab<T, C: cfg::Config = DefaultConfig> {
    shards: Box<[Shard<Option<T>, C>]>,
    _cfg: PhantomData<C>,
}

// ...

pub struct Pool<T: Clear + Default, C: cfg::Config = DefaultConfig> {
    shards: Box<[Shard<T, C>]>,
    _cfg: PhantomData<C>,
}

for the most part, they could expose similar APIs, but I think the Slab::take method

sharded-slab/src/lib.rs

Lines 410 to 420 in 12306dc

    
           pub fn take(&self, idx: usize) -> Option<T> { 
        
               let tid = C::unpack_tid(idx); 
        
               test_println!("rm {:?}", tid); 
        
               let shard = &self.shards[tid.as_usize()]; 
        
               if tid.is_current() { 
        
                   shard.take_local(idx) 
        
               } else { 
        
                   shard.take_remote(idx) 
        
               } 
        
           }

only makes sense for a Slab; the Pool can't really implement this if we want to be able to guarantee that we'll retain allocations in place. The internal Shard/Page/Slot types can expose their take methods in separate impl blocks where the T type parameter is Option<T>.

@bIgBV since you were interested in working on this, i've left some comments explaining what i have in mind — hopefully, that's enough to get you started! please let me know if you have any more questions!

Okay, from reading your notes, I gather that the object pool is separate from the slab and works in conjunction with the slab. It's primarily for heap allocated objects and to help ensure that allocations for the heap allocated objects are reused. This means that when you want to instantiate a heap allocated object, you need to do so through the slab, so that it can keep track of the allocations, correct?

The question now becomes how does work with existing APIs. Slab::insert currently takes ownership of T and Slab::take returns the ownership back, so the caller is responsible for managing heap allocated object.

From what I can see, object pool APIs have you checking out an object, mutating it and using it before checking the object back in. But this usecase is completely different from how the slab works where we insert a value to be shared across multiple threads. I think we could provide an Entry like API where the callee first checks out an entry and modifies it in place. Now this entry's memory is managed by the pool and it's available across threads as it's backed by the Slab

EDIT: I wrote this up before your additional comments outlining a design and implementation. It looks like the design answers the questions I had.

Okay, from reading your notes, I gather that the object pool is separate from the slab and works in conjunction with the slab. It's primarily for heap allocated objects and to help ensure that allocations for the heap allocated objects are reused. This means that when you want to instantiate a heap allocated object, you need to do so through the slab, so that it can keep track of the allocations, correct?

yup, that's right. i was initially wondering about providing both a storage type that's moved in and out and a pooled allocation in the same Slab type, but i think that's too much effort — users can always implement it with something like this:

struct MyPooledThing {
    heap_data: Vec<u8> // just using this as an example of a pooled heap allocation...
    small_struct: Option<SomeSmallStruct>,
}

impl Clear for MyPooledThing {
    fn clear(&mut self) {
         self.heap_data.clear();
         self.small_struct = None;
    }
}

The question now becomes how does work with existing APIs. Slab::insert currently takes ownership of T and Slab::take returns the ownership back, so the caller is responsible for managing heap allocated object.

yeah, that's right — as i mentioned in #2 (comment), i don't think take makes sense for a pool. it could be nice to have an insert API where you get mutable access to the slot before it's made available for other threads to reference, but if we want to punt on that, i think it would also be fine to have a create method or something that just returns an index, and doesn't move anything in; you would then lock the pooled data to modify it.

a way to access the slot mutably would be nicer, though; the issue is just that we would need to make sure any mutation is done before it's made visible to other threads through Pool::get...we could probably take a FnOnce(&mut T) and run it before giving back the slot's index, or have another RAII guard type...

a way to access the slot mutably would be nicer, though; the issue is just that we would need to make sure any mutation is done before it's made visible to other threads through Pool::get...we could probably take a FnOnce(&mut T) and run it before giving back the slot's index, or have another RAII guard type...

I think taking a FnOnce(&mut T) an running it in the block when a slot's contents are being re-written should theoretically work. But I think that would be additional work on top of the basic pooling APIs.

I've been thinking about a path forward for this. One idea I have is to add a new Slot::get_used_block method which returns a previously allocated and used slot without writing to it. It just sets up all the required state for a slot which has references to it and returns the slot.

We could possibly extract the following snippet into it's own private method.

sharded-slab/src/page/slot.rs

Lines 162 to 164 in 12306dc

    
           let lifecycle = self.lifecycle.load(Ordering::Acquire); 
        
           let gen = LifecycleGen::from_packed(lifecycle).0; 
        
           let refs = RefCount::<C>::from_packed(lifecycle);

This method will only be exposed when T: Clear + Default so only the Pool type will have access to it.

We can now expose a new method in Shard Shard::get_allocated which calls Slot::get_used_block if the current page is allocated, if it isn't we return None and the calling function will insert a new T::default which will follow the Pool::create code path.

One concern is that we might have to change Shard::allocate to allow the allocation with an empty T in the slot, instead of just an empty slot.

	pub(crate) struct Slot<T, C> {
	lifecycle: AtomicUsize,
	/// The offset of the next item on the free list.
	next: CausalCell<usize>,
	/// The data stored in the slot.
	item: CausalCell<Option<T>>,
	_cfg: PhantomData<fn(C)>,
	}

	pub fn take(&self, idx: usize) -> Option<T> {
	let tid = C::unpack_tid(idx);

	test_println!("rm {:?}", tid);
	let shard = &self.shards[tid.as_usize()];
	if tid.is_current() {
	shard.take_local(idx)
	} else {
	shard.take_remote(idx)
	}
	}

	let lifecycle = self.lifecycle.load(Ordering::Acquire);
	let gen = LifecycleGen::from_packed(lifecycle).0;
	let refs = RefCount::<C>::from_packed(lifecycle);