RichieSams/FiberTaskingLib

Segmentation fault in test for Calculate Triangle Number on Mac OS X (Sierra)

amesgames opened this issue · 13 comments

I can prevent (or hide) the segmentation fault by moving the code that deletes the task array in TriangleNumberMainTask() so that the delete occurs after the WaitForCounter() call.

This does not make sense to me as it seems we should be able to delete the array immediately after calling AddTasks().

This occurs with commit [dd05035] using make (not Xcode). I will test on Windows 10 momentarily.

EDIT: This does not occur for me using VC2015 (amd_x64) in Release and Debug builds. Just an OS X issue?

Thanks for the info. We're aware of issues for really short running tasks, but never had any issues for TriangleNumber test.

What are the specs of your computer? Compiler version?

On short running tasks we're getting stack corruption. More than likely, the slight differences in scheduling on OSX are enough to trigger a corner-case.

Model Name: MacBook Pro
Model Identifier: MacBookPro11,5
Processor Name: Intel Core i7
Processor Speed: 2.8 GHz
Number of Processors: 1
Total Number of Cores: 4

For the XCode project, I used the defaults for "cmake -G Xcode". Xcode Version 8.2 (8C38) with Apple LLVM 8.0 compiler. I have barely ever used Xcode, so am still learning my way around the interface; however, I got it to crash with the debugger attached. It crashed on the switch statement on tls.OldFiberDestination in CleanUpOldFiber(), called via FiberStart(). The crash is EXC_BAD_ACCESS on the memory where tls.OldFiberDestination resides.

The crash occurs before the main test thread ever enters TriangleNumberMainTask(). So, my theory of pushing the delete around seems completely incorrect. Ok, now this is weird: I moved the delete of the tasks array back to where it was and it runs completely fine in the debugger, but not out of the debugger. So, probably a race.

Let me know how else I can help here. Would a crash dump be useful? I am really stoked that I found this project. I was literally about to build a similar thing after watching the Naughty Dog GDC talk two days ago.

Can you re-test with the updated master? Thanks!

ftl-test still segfaults with an Xcode build and with a standard make build on Mac OS X about 1 in 5 runs. A little less often, but now the make build is doing it as well. It is possible I just never ran it often enough using make before.

I can successfully reproduce the error from within Xcode, on Sierra.
However, make build with default Xcode compilers, I can't reproduce.

That said, I have found the problem, GetCurrentThreadIndex():

GetCurrentThreadIndex() fail

std::size_t TaskScheduler::GetCurrentThreadIndex() {
    #if defined(FTL_WIN32_THREADS)
        DWORD threadId = GetCurrentThreadId();
        for (std::size_t i = 0; i < m_numThreads; ++i) {
            if (m_threads[i].Id == threadId) {
                return i;
            }
        }
    #elif defined(FTL_POSIX_THREADS)
        pthread_t currentThread = pthread_self();
        for (std::size_t i = 0; i < m_numThreads; ++i) {
            if (pthread_equal(currentThread, m_threads[i])) {
                return i;
            }
        }
    #endif

    return FTL_INVALID_INDEX;
}

The function fails to find a matching thread in m_threads, so it returns FTL_INVALID_INDEX.
This causes the dereference from m_tls to fetch garbage, and hence, a Segmentation fault.

That all said, I have no idea why pthread_equal is failing:

pthread_equal fail

Tested on XCode 7.3 on El Capitan, and I'm unable to re-create the issue. I'm wondering it it's an issue with Apple LLVM 8.1....

After a lot of trials, I was able to get it to trigger with gcc-6 in OS Sierra.

Adrians-iMac:build_make adrian$ make test
Running tests...
Test project /Users/adrian/Documents/FiberTaskingLib/build_make
    Start 1: FiberAbstraction.SingleFiberSwitch
1/4 Test #1: FiberAbstraction.SingleFiberSwitch ...   Passed    0.01 sec
    Start 2: FiberAbstraction.NestedFiberSwitch
2/4 Test #2: FiberAbstraction.NestedFiberSwitch ...   Passed    0.01 sec
    Start 3: FunctionalTests.ProducerConsumer
3/4 Test #3: FunctionalTests.ProducerConsumer .....   Passed    0.06 sec
    Start 4: FunctionalTests.CalcTriangleNum
4/4 Test #4: FunctionalTests.CalcTriangleNum ......***Exception: SegFault  0.01 sec

75% tests passed, 1 tests failed out of 4

Total Test time (real) =   0.09 sec

The following tests FAILED:
	  4 - FunctionalTests.CalcTriangleNum (SEGFAULT)
Errors while running CTest
make: *** [test] Error 8

The failure rate for gcc-6 is really really low. I've only triggered it 2 times in about 80 runs. With Apple LLVM 8.1 in a make build, I haven't yet triggered it. But with Xcode, it triggers about 1 in 4 runs.

Thanks for the cool library. Unfortuantely I can reproduce this error frequently on LLVM 8.1, also with make build.

I plotted the values of m_threads[i] after creation of each thread and plotted all m_threads[i] if GetCurrentThreadIndex is going to fail.
The result is:

Thread created: 0x7fffc684c3c0
Thread created: 0x700006881000
Thread created: 0x700006904000
GetCurrentThreadFailed!
Current Thread: 0x700006987000
Thread0 in list: 0x7fffc684c3c0
Thread1 in list: 0x700006881000
Thread2 in list: 0x700006904000
Thread3 in list: 0x0
Thread created: 0x700006987000

NB: As plotting is only triggered after it is already clear that GetCurrentThreadIndex is about to fail, the plotting should not be an influence.

This means pthread_create starts up the thread before m_threads[i] is written. This sounds strang and wrong to me.
What do you think? Do you agree?

Oh!! Good catch!

Worker threads are created using this bit of code:

if (!CreateThread(524288, ThreadStart, threadArgs, i, &m_threads[i])) {
    printf("Error: Failed to create all the worker threads");
    return;
}

In Win32-land, we create the thread suspended, so we can set the affinity and fill in m_threads[i]

inline bool CreateThread(uint stackSize, ThreadStartRoutine startRoutine, void *arg, size_t coreAffinity, ThreadType *returnThread) {
	HANDLE handle = reinterpret_cast<HANDLE>(_beginthreadex(nullptr, stackSize, startRoutine, arg, CREATE_SUSPENDED, nullptr));

    if (handle == nullptr) {
        return false;
    }

    DWORD_PTR mask = 1ull << coreAffinity;
    SetThreadAffinityMask(handle, mask);
	
    returnThread->Handle = handle;
    returnThread->Id = GetThreadId(handle);
    ResumeThread(handle);

    return true;
}

However, pthreads don't have the option to create suspended:

inline bool CreateThread(uint stackSize, ThreadStartRoutine startRoutine, void *arg, size_t coreAffinity, ThreadType *returnThread) {
    pthread_attr_t threadAttr;
    pthread_attr_init(&threadAttr);

    // Set stack size
    pthread_attr_setstacksize(&threadAttr, stackSize);

    // TODO: OSX and MinGW Thread Affinity
    #if defined(FTL_OS_LINUX)
        // Set core affinity
        cpu_set_t cpuSet;
        CPU_ZERO(&cpuSet);
        CPU_SET(coreAffinity, &cpuSet);
        pthread_attr_setaffinity_np(&threadAttr, sizeof(cpu_set_t), &cpuSet);
    #endif

    int success = pthread_create(returnThread, NULL, startRoutine, arg);

    // Cleanup
    pthread_attr_destroy(&threadAttr);

    return success == 0;
}

returnThread is passed directly into pthread_create, BUT there's no guarantee that pthread_create is atomic. And I'm 95% sure it isn't.

So what's happening is that we're creating a thread, and it's starting before m_threads[i] is assigned. In most cases, we don't see any issues, because the startup of the thread is slower than the assignment. (Probably up to the thread scheduler gods).

To solve this, we can add some synchronization to thread start, so we make sure TaskScheduler::Run() has fully initialized everything before the threads try to do any work.

That commit should fix the issue. However, please test this with your code, and let me know if you have any issues

Hey, works great. Thx.

Awesome! Thanks for testing things. :)