garrigue/lablgtk

lablgtk fails on multicore due to use of naked pointers

Opened this issue · 10 comments

Here is a simplified version of lablgtk's gpointer.ml file, showing the problem:

module Gpointer = struct
  let raw_null = snd (Obj.magic Nativeint.zero)
end

let () =
  Gc.full_major ()
$ ocaml test.ml
fish: “ocaml test.ml” terminated by signal SIGSEGV (Address boundary error)

I think this is the cause of ocaml-multicore/ocaml-multicore#609. There is some more information about naked pointers at https://discuss.ocaml.org/t/ann-a-dynamic-checker-for-detecting-naked-pointers/5805.

This is a known problem in lablgtk, and I plan to address it.
The null pointer case is just an instance, and it is relatively easy to solve.
The main problem is with the generated translation tables, which are static C data, and where it is a bit difficult to add headers.
Of course, I would welcome a compact patch :-)

@kit-ty-kate raised this issue again, so I proposed we coordinate a possible effort here.

OK, I need to do something about that.
IIRC, most pointers are already properly wrapped, but translation tables generated by varcc do not contain the required headers. This is a bit painful to do, as the header size is different from the other contents size...

See #144 and #145 for fixes for lablgtk3 and lablgtk2 respectively.

We tried testing #145 (i.e. the lablgtk2 version) on multicore/5.00, but this doesn't seem to work, and we don't know why. If somebody can have a look at it this would be nice.

I tried the lablgtk3 version on 4.12+domains. I used this code:

let () = print_endline @@ GMain.init ()

But it fails for me:

$ opam pin add lablgtk3 "git+https://github.com/garrigue/lablgtk.git#0ae631f3a0dd153c2d8e05e9ee3cc906c8503bb1"
$ ocamlfind ocamlopt -thread -package lablgtk3 -linkpkg -o test.exe test.ml
$ ./test.exe 
Fatal error: exception Failure("Obj.truncate not supported")

I also tried building it with dune, with the same result.

Can you try with the lablgtk2 version.
The call to Obj.truncate is removed there.
It is easier to test for us.

I have cherry-picked the changes to the lablgtk3 version in #144 .
Please test, I would like to release.

Thanks - some of the examples now work for me. e.g. dune exec -- ./examples/entry.exe works. But others don't, e.g.

$ dune exec -- ./examples/hello.exe
fish: “dune exec -- ./examples/hello.e…” terminated by signal SIGSEGV (Address boundary error)
(rr) bt
#0  caml_darken (v=0, ignored=0x0, state=0x0) at major_gc.c:761
#1  0x000055eb1018cc60 in caml_darken (state=state@entry=0x0, v=<optimized out>, ignored=ignored@entry=0x0) at major_gc.c:759
#2  0x000055eb1018fed0 in write_barrier (new_val=94468103054592, old_val=<optimized out>, field=0, obj=obj@entry=140443383598920)
    at memory.c:140
#3  caml_initialize (fp=fp@entry=0x7fbb85fd8f48, val=val@entry=94468103054592) at memory.c:212
#4  0x000055eb10174fe9 in Val_GObject_new (p=0x55eb11b9a500) at ml_gobject.c:62
#5  0x000055eb101ae26f in <signal handler called> ()
#6  0x000055eb100c654b in camlGobject__unsafe_create_362 () at src/gobject.ml:208

Thanks for the feedback.
Then I think I will merge the PR. Even if it doesn't work on multicore properly, it becomes possible to debug.