[BUG] Premature changing of pxCurrentTCB BEFORE actual context switch in Win32/64 Port
Vitaly1983 opened this issue · 1 comments
pxCurrentTCB is currently changed BEFORE newly selected user FreeRTOS thread (task) is resumed. This leads to sporadic errors with working of system objects, e.g., mutexes.
Target
Win32/64 FreeRTOS simulator port. FreeRTOS Kernel V10.5.1, 'port.c' source file
Host
Any
To Reproduce
This will lead to sporadic errors, in program on multicore CPUs. The more threads will be in program, and more often they will
be switched, the higher probability of errors occuring. Errors will be related to system objects (e.g., mutexes), which access pxCurrentTCB pointer.
The problem is that inside prvProcessSimulatedInterrupts()->vTaskSwitchContext()->taskSELECT_HIGHEST_PRIORITY_TASK()
->listGET_OWNER_OF_NEXT_ENTRY() there is a line:
( pxTCB ) = ( pxConstList )->pxIndex->pvOwner;
there pxTCB == pxCurrentTCB
BUT, the thread which was assigned to the previously selected pxCurrentTCB, is NOT suspended during execution of vTaskSwitchContext(). If inside listGET_OWNER_OF_NEXT_ENTRY() a new thread will be selected for execution, then the previously executed will be suspended only a few lines below:
vTaskSwitchContext();
/* If the task selected to enter the running state is not the task
that is already in the running state. */
if( pvOldCurrentTCB != pxCurrentTCB )
{
/* Suspend the old thread. In the cases where the (simulated)
interrupt is asynchronous (tick event swapping a task out rather
than a task blocking or yielding) it doesn't matter if the
'suspend' operation doesn't take effect immediately - if it
doesn't it would just be like the interrupt occurring slightly
later. In cases where the yield was caused by a task blocking
or yielding then the task will block on a yield event after the
yield operation in case the 'suspend' operation doesn't take
effect immediately. */
pxThreadState = ( ThreadState_t *) *( ( size_t * ) pvOldCurrentTCB );
-->>> SuspendThread( pxThreadState->pvThread );
Since there is a time gap between re-assigning pxCurrentTCB pointer and the actual suspension of previously active thread, if that thread checks for pxCurrentTCB variable in this same moment, the returned value will be incorrect.
I added the following quick fix to prvProcessSimulatedInterrupts() function:
if (pvOldCurrentTCB)
{
pxThreadState = ( ThreadState_t *) *( ( size_t * ) pvOldCurrentTCB ); // <-- 1
if (pxThreadState && pxThreadState->pvThread)
SuspendThread( pxThreadState->pvThread ); // <-- 2
}
/* Select the next task to run. */
vTaskSwitchContext();
/* If the task selected to enter the running state is not the task
that is already in the running state. */
if( pvOldCurrentTCB != pxCurrentTCB )
{
/* Suspend the old thread. In the cases where the (simulated)
interrupt is asynchronous (tick event swapping a task out rather
than a task blocking or yielding) it doesn't matter if the
'suspend' operation doesn't take effect immediately - if it
doesn't it would just be like the interrupt occurring slightly
later. In cases where the yield was caused by a task blocking
or yielding then the task will block on a yield event after the
yield operation in case the 'suspend' operation doesn't take
effect immediately. */
pxThreadState = ( ThreadState_t *) *( ( size_t * ) pvOldCurrentTCB );
// SuspendThread( pxThreadState->pvThread );
/* Ensure the thread is actually suspended by performing a
synchronous operation that can only complete when the thread is
actually suspended. The below code asks for dummy register
data. Experimentation shows that these two lines don't appear
to do anything now, but according to
https://devblogs.microsoft.com/oldnewthing/20150205-00/?p=44743
they do - so as they do not harm (slight run-time hit). */
xContext.ContextFlags = CONTEXT_INTEGER;
( void ) GetThreadContext( pxThreadState->pvThread, &xContext );
/* Obtain the state of the task now selected to enter the
Running state. */
pxThreadState = ( ThreadState_t * ) ( *( size_t *) pxCurrentTCB );
/* pxThreadState->pvThread can be NULL if the task deleted
itself - but a deleted task should never be resumed here. */
configASSERT( pxThreadState->pvThread != NULL );
ResumeThread( pxThreadState->pvThread );
}
else
{
if (pvOldCurrentTCB && pxThreadState && pxThreadState->pvThread)
ResumeThread( pxThreadState->pvThread ); // <-- 2
}
Some checks for non-NULL pointers may be redundant, but the idea should be clear. It may be even more clean not to use Resume/Suspend API in case if pxCurrentTCB was not changed. Maybe, by making vTaskSwitchContext() not to update pxCurrentTCB variable inside, but to return pointer to newly selected thread to run.
After this change, strange sporadic errors that occured before in my program, seems to disappear.
Would you please try with the following snippet and see that it addresses the issue you mentioned:
if( ulSwitchRequired != pdFALSE )
{
/* Suspend the old thread. */
pxThreadState = ( ThreadState_t * ) *( ( size_t * ) pxCurrentTCB );
SuspendThread( pxThreadState->pvThread );
/* Ensure the thread is actually suspended by performing a
* synchronous operation that can only complete when the thread
* is actually suspended. The below code asks for dummy register
* data. Experimentation shows that these two lines don't appear
* to do anything now, but according to
* https://devblogs.microsoft.com/oldnewthing/20150205-00/?p=44743
* they do - so as they do not harm (slight run-time hit). */
xContext.ContextFlags = CONTEXT_INTEGER;
( void ) GetThreadContext( pxThreadState->pvThread, &xContext );
/* Select the next task to run. */
vTaskSwitchContext();
/* Obtain the state of the task now selected to enter the
* Running state. */
pxThreadState = ( ThreadState_t * ) ( *( size_t * ) pxCurrentTCB );
/* pxThreadState->pvThread can be NULL if the task deleted
* itself - but a deleted task should never be resumed here. */
configASSERT( pxThreadState->pvThread != NULL );
ResumeThread( pxThreadState->pvThread );
}