garrigue/lablgtk

On MacOS, lablgtk2 runs OCaml code inside a blocking section, causing a crash in caml_memprof_handle_postponed_exn

Closed this issue · 1 comments

btj commented

Since recently, I am seeing my VeriFast application (using lablgtk2) crash sometimes on MacOS in caml_memprof_handle_postponed_exn. See verifast/verifast#252 .

The chain of events is as follows:

  • In ml_poll
    static gint ml_poll (GPollFD *ufds, guint nfsd, gint timeout)
    , lablgtk calls caml_enter_blocking_section and then calls the Gdk Quartz default poll_func https://gitlab.gnome.org/GNOME/gtk/-/blob/2.24.33/gdk/quartz/gdkeventloop-quartz.c#L709
  • It calls AppKit's nextEventMatchingMask with the NSDefaultRunLoopMode.
  • Apparently, in this mode, the run loop dispatches paint events. lablgtk (correctly, I think) does not expect this. The paint event is dispatched into OCaml code, which is not allowed inside a blocking section.

So, while I believe lablgtk is not "to blame" for this bug, any help or advice would of course be appreciated.

One (hacky) solution would be to add code to marshal to check if we are in a blocking section and, if so, temporarily exit it. Would that make sense?

I am curious as to why I am seeing these crashes only now. One recent change is that I recently switched from OCaml 4.6 to OCaml 4.13. Perhaps 4.13 is more sensitive to calling into OCaml while inside a blocking section?

btj commented

I just realized that this bug could cause not just crashes, but data races as well, in multithreaded programs. (VeriFast is not multithreaded but there may be multithreaded lablgtk2 apps out there.)