Linked list of free heap memory blocks gets corrupted on reallocations
jcmonteiro opened this issue ยท 7 comments
Issue template
- Hardware description: Olimex STM32-E407
- RTOS: FreeRTOS
- Installation type: micro_ros_setup
- Version or commit hash: galactic | 7c24435
Steps to reproduce the issue
Reliably reproducing the issue might not be that easy as it depends on how the memory is allocated and reallocated and the fragmentation happening in the reallocation steps.
I can reliably trigger the issue in my setup by creating a lifecycle node. Depending on the rest of the code, the issue will happen at different parts, but always when reallocating memory while initializing the state machine.
Expected behavior
- The state machine is properly created.
- The linked list of free blocks, defined in custom_memory_manager:39-43, holds valid free blocks of coherent sizes that point to the next valid free block.
Actual behavior
- The state machine is not created correctly and there is "garbage" populating different variables.
- The linked list of free blocks is corrupted.
Additional information
I read through the implementation of the memory management functions, and it appears to me that there is a bug in the reallocation function - pvPortRealloc. In particular, the aforementioned line should hold the count to the amount of allocated memory for the object, and not the total free block size. Therefore, it should be changed to
size_t count = (pxLink->xBlockSize & ~xBlockAllocatedBit) - xHeapStructSize;
Suggested changes
I believe the state machine could be refactored to avoid all the reallocation, to begin with. We know the number of transitions and states beforehand, so we could already allocate all the necessary memory space for the maps. Naturally, this would be a PR to ros2/rcl
By changing the line that defines size_t count
I have managed to fix the issue on my side. I will make a PR with the suggested changes. Please, advise if I have missed something here and my assumption about the count
is off.
An extra bit of information here, I have defined the following additional function at custom_memory_allocation.c
that helped me narrow down the problem. It is used to print the free blocks of memory by following the linked list.
static void prvDisplayHeapBlocks(void);
static void prvDisplayHeapBlocks(void)
{
BlockLink_t *pxIterator;
pxIterator = &xStart;
pxIterator = pxIterator->pxNextFreeBlock;
while ( pxIterator != NULL )
{
printf("[%6d] @ [%p]\n", pxIterator->xBlockSize, (void*)pxIterator);
pxIterator = pxIterator->pxNextFreeBlock;
}
printf("--------\n");
}
While debugging, I have added this function in three different places inside pvPortRealloc
to pinpoint which call was corrupting the memory.
void *pvPortRealloc( void *pv, size_t xWantedSize )
{
vTaskSuspendAll();
prvDisplayHeapBlocks(); // <-------- printing before allocating a new block
void * newmem = pvPortMalloc(xWantedSize);
prvDisplayHeapBlocks(); // <-------- printing after allocating a new block
uint8_t *puc = ( uint8_t * ) pv;
BlockLink_t *pxLink;
puc -= xHeapStructSize;
pxLink = ( void * ) puc;
char *in_src = (char*)pv;
char *in_dest = (char*)newmem;
size_t count = pxLink->xBlockSize & ~xBlockAllocatedBit;
while(count--)
*in_dest++ = *in_src++;
prvDisplayHeapBlocks(); // <-------- printing after copying
vPortFree(pv);
( void ) xTaskResumeAll();
return newmem;
}
After many calls to reallocate memory, I eventually see something like this
calling pvPortRealloc() with xWantedSize = 96
[ 16] @ [0x2000d7b0 == ucHeap + 47244]
[ 24] @ [0x2000d7f0 == ucHeap + 47308]
[ 104] @ [0x2000dac8 == ucHeap + 48036]
[ 5848] @ [0x2000df18 == ucHeap + 49140]
[ 0] @ [0x2000f5f0 == ucHeap + 54988]
--------
[ 16] @ [0x2000d7b0 == ucHeap + 47244]
[ 24] @ [0x2000d7f0 == ucHeap + 47308]
[ 5848] @ [0x2000df18 == ucHeap + 49140]
[ 0] @ [0x2000f5f0 == ucHeap + 54988]
--------
[ 16] @ [0x2000d7b0 == ucHeap + 47244]
[ 24] @ [0x2000d7f0 == ucHeap + 47308]
[ 5920] @ [0x2000ded0 == ucHeap + 49068]
[ 0] @ [0x2000f5f0 == ucHeap + 54988]
calling pvPortRealloc() with xWantedSize = 96
--------
[ 16] @ [0x2000d7b0 == ucHeap + 47244]
[ 24] @ [0x2000d7f0 == ucHeap + 47308]
[ 5920] @ [0x2000ded0 == ucHeap + 49068]
[ 0] @ [0x2000f5f0 == ucHeap + 54988]
--------
[ 16] @ [0x2000d7b0 == ucHeap + 47244]
[ 24] @ [0x2000d7f0 == ucHeap + 47308]
[ 5816] @ [0x2000df38 == ucHeap + 49172]
[ 0] @ [0x2000f5f0 == ucHeap + 54988]
--------
[ 16] @ [0x2000d7b0 == ucHeap + 47244]
[ 24] @ [0x2000d7f0 == ucHeap + 47308]
[ 104] @ [0x2000dac8 == ucHeap + 48036]
[-2147483528] @ [0x2000df38 == ucHeap + 49172]
In this example, I have reserved 55000 bytes of heap memory. You can see in the last printed line that the linked list is corrupted. The program continue executing, but the outcome is non-deterministic. By making that change that I have suggested, the data integrity is preserved.
Good catch! Let me handle the PRs, as we have this custom allocators on multiples repositories.
@Acuadros95, thank you for taking a look at this. Additionally, it is worth mentioning that, according to the documentation at ros2/rcutils/include/rcutils/allocator.h:59-66, the implementation of realloc should behave as realloc. Therefore, two things are missing in this implementation.
- Check for an unsuccessful allocation. In this case, the pointer should not be freed and a
nullptr
should be returned. - According to the documentation on
realloc
-The content of the memory block is preserved up to the lesser of the new and old sizes
. Therefore,count
should be the minimum of(pxLink->xBlockSize & ~xBlockAllocatedBit) - xHeapStructSize
andxWantedSize
.
Here we go: #99
Please test it out and give feedback.
Of course, @Acuadros95. I will let you know.
@jcmonteiro the PR is merged!
Closing, please reopen if you encounter any other issues.