snazzy-d/sdc

Implement a thread cache for GC allocations.

deadalnix opened this issue · 2 comments

A very common mechanism to speedup a GC or an allocator in general is to add a thread cache, as to avoid complex operations on most allocations.

When a piece of memory is freed by a thread, it goes into the thread local cache. As far as most of the allocator is concerned, the memory is never freed. If the cache grows past a certain size, several allocation are freed in bulk to the arena.

When allocating, the allocator first snoop into the cache and if there is a slot available, returns it immediately. If no slot is available, then many slots are allocated in bulk all at once from the arena.

The size of the cache can be managed automatically from the workload of the thread.

Both tcmalloc and jemalloc implement such a mechanism (in fact, the tc in tcmalloc stands for thread cache). Some useful references from jemalloc:
Allocation from the cache: https://github.com/jemalloc/jemalloc/blob/5832ef658975d5f2da2bdfddf55712d9fa343e30/include/jemalloc/internal/cache_bin.h#L337-L381
Refill the cache: https://github.com/jemalloc/jemalloc/blob/5832ef658975d5f2da2bdfddf55712d9fa343e30/src/arena.c#L929-L1055
Deallocation from the cache: https://github.com/jemalloc/jemalloc/blob/5832ef658975d5f2da2bdfddf55712d9fa343e30/include/jemalloc/internal/cache_bin.h#L448-L467
Flush the cache: https://github.com/jemalloc/jemalloc/blob/5832ef658975d5f2da2bdfddf55712d9fa343e30/src/tcache.c#L308-L503

Do we want this to cover all size classes of alloc?

Just the small ones.