-
The memory pool is divided into three layers:
thread_cache
,central_cache
andpage_cache
. -
thread_cache
is private for each thread. It maintains the array of linked lists (_freelists[104]
), where each linked list is responsible for memory of the same size. -
central_cache
is public and unique. It maintains the array of doubly circular linked lists (_blocklists[104]
), but each linked list is responsible for several blocks. Thestruct block
manages large block of memory in page units. The_freelist
member variable in struct is for maintaining the addresses of the split small memory blocks.struct block { size_t page_id = 0; size_t n = 0; // page number size_t sub_size = 0; // split size void* _freelist = nullptr; size_t use_count = 0; bool is_used = false; block* prev = nullptr; block* next = nullptr; };
The
central_cache
will fetch and distribute small memory from the corresponding block's freelist based on the size required bythread_cache
. The implementation involves some other tricks, such as batch allocation, low start algorithm, etc. -
page_cache
is public and unique. It maintains the array of doubly circular linked lists (_blocklists[128]
). However, each block in_blocklists[k]
manages memory that spans$k+1$ pages (unsplit).The page_cache is the only layer that invokes
brk
ormmap
. When the central_cache runs out of memory, it requests blocks of several page sizes from the page_cache. This part of pc involves tricks similar to the buddy system algorithm. -
Other tricks:
e.g., When constructing the mapping from page_id to block, the original project initially used unordered_map+mutex. In this implementation, we replaced it with unordered_map+shared_mutex to ensure concurrent reads. Later on, the original project switched to a lock-free radix_tree, but the depth was too small (i.e., unsuitable for 64-bit os). In this project, we implemented the radix_tree for 64-bit os (similar to the multi-level page table mechanism).
https://blog.csdn.net/m0_62782700/article/details/135443352?spm=1001.2014.3001.5502
The special case is that when frequently allocating fixed-size memory, if the size of a single allocation is very small (such as 17B), the efficiency of native malloc/free is faster than this implementation. I guess this is due to the difference between brk and mmap (we use).