ocaml-pr6764, ocaml-pr6776

This code is a follow up of the issue described here http://caml.inria.fr/mantis/view.php?id=6764. The program segfaults with OCaml 4.02.1 and the latest version of trunk. This issue has been reported here http://caml.inria.fr/mantis/view.php?id=6776

Summary

Description

I am looking at embedding the OCaml runtime in a shared library. This shared library might be loaded and unloaded several time by an application that makes heavy use of threads.

The code hosted here (which was originally written by Mark Shinwell during the discussion of the aforementioned related issue) illustrates what happens when the shared object is loaded and unloaded several time in a row.

Debugging the issue with this example, it appear that the "tick" thread might still be alive after the dlclose function returns. The tick thread should have been cleaned up by the caml_thread_cleanup function, which is registered in the at_exit calls. However, the implementation of st_thread_kill in otherlibs/systhreads/st_posix.c is

static void st_thread_kill(st_thread_id thr)
{
  pthread_cancel(thr);
}

From http://man7.org/linux/man-pages/man3/pthread_cancel.3.html [^] the thread cancellation happens asynchronously, and "the return status of pthread_cancel() merely informs the caller whether the cancellation request was successfully queued."

A suitable fix would be to use pthread_join after the pthread_cancel. This would still not have the same behavior as the win32 implementation (which uses TerminateThread and thus cannot execute any user-mode code).

Another fix would be to replace the st_thread_kill in caml_thread_cleanup with something a bit more agressive about termination (a new st_thread_real_kill?)

Running the example

$ make
$ make run