Pithikos/C-Thread-Pool

Dead locks

JensMunkHansen opened this issue · 1 comments

Hi Guys

Great idea to make a simple ANSI-C threadpool. The interface is great, but I have encountered numerous problems using it.

In your example:

After adding the work

for (i=0; i<20; i++){
thpool_add_work(thpool, (void*)task1, NULL);
thpool_add_work(thpool, (void*)task2, NULL);
};

if you add any of the following, you get a dead lock

thpool_pause(thpool);
thpool_resume(thpoll);

or

thpool_pause(thpool);
printf("nthreads alive: %d\n", thpool_num_threads_working(thpool));
sleep(1);
thpool_resume(thpool);

Accessing the variable num_threads_working is causing the error. Volatile does not imply atomic updates.

Another issue is all conditions and mutex'es. You are not allowed to reinitialize any of the two. This is possible in many ways. As long as only a single queue is supported, I can recommend using PTHREAD_COND_INITIALIZER and PTHREAD_MUTEX_INITIALIZER

I haven't been able to come up with a good solution supporting pause/resume.

Regards
Jens Munk

Hi, I found that the first problem is caused by the assignment of threads_on_hold in multiple places. First, thpool_pause sends a signal to all threads, then calls thpool_resume to set threads_on_hold to 0 but the child thread may be dispatched after the main thread, which causes the child thread's signal handler thread_hold to set threads_on_hold to 1, causing the child thread to be unawakened. .

(gdb) bt
#0 0x00007fc839ae69d0 in __GI___nanosleep (requested_time=requested_time@entry=0x7ffca9839000, remaining=remaining@entry=0x7ffca9839000) at ../sysdeps/unix/sysv/linux/nanosleep.c:28
#1 0x00007fc839ae68aa in __sleep (seconds=0) at ../sysdeps/posix/sleep.c:55
#2 0x000055c352562289 in thpool_destroy (thpool_p=0x55c352e43670) at thpool.c:211
#3 0x000055c352561f19 in main () at example.c:47

(gdb) i thread
Id Target Id Frame
1 Thread 0x7fc83a214740 (LWP 4981) "a.out" 0x00007fc839ae69d0 in __GI___nanosleep (requested_time=requested_time@entry=0x7ffca9839000, remaining=remaining@entry=0x7ffca9839000)
at ../sysdeps/unix/sysv/linux/nanosleep.c:28
2 Thread 0x7fc839a01700 (LWP 4982) "thread-pool-0" 0x00007fc839ae69d0 in __GI___nanosleep (requested_time=requested_time@entry=0x7fc839a005a0,
remaining=remaining@entry=0x7fc839a005a0) at ../sysdeps/unix/sysv/linux/nanosleep.c:28
...(Other threads)

(gdb) p threads_on_hold
$1 = 1

Deadlock issue no longer occurs after moving threads_on_hold = 1 from thread_hold to the beginning of thpool_pause

For the second question, I tried with the modified code and found that this is not necessarily going to happen. And in the call stack, I don't understand why thpool_pause calls the printf function that should be called after thpool_pause?

(gdb) bt
#0 __lll_lock_wait_private () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95
#1 0x00007f53f59da4d6 in _IO_vfprintf_internal (s=0x7f53f5d6b760 <IO_2_1_stdout>, format=0x55e5f5aa6c47 "nthreads alive: %d\n", ap=0x7ffd38907450) at vfprintf.c:1325
#2 0x00007f53f6194000 in ?? ()
#3 0x0000000000000012 in ?? ()
#4 0x00007f53f5d7f2c4 in __pthread_kill (threadid=, signo=6983) at ../sysdeps/unix/sysv/linux/pthread_kill.c:53
#5 0x000055e5f5aa638d in thpool_pause (thpool_p=0x55e5f7726670) at thpool.c:230
#6 0x000055e5f5aa5f14 in main () at example.c:42

(gdb) p threads_on_hold
$1 = 1

And after I try to remove all puts and printf, the deadlock no longer occurs, will it be a problem with printf and puts?