ninenines/esdl2

seg fault after bullet sample completes

hdiedrich opened this issue · 26 comments

erlc -o examples/bullet_engine examples/bullet_engine/*.erl
cd examples/bullet_engine && ./start.sh
Erlang/OTP 17 [erts-6.0] [source-07b8f44] [64-bit] [smp:4:4] [async-threads:10] [hipe] [kernel-poll:false] [dtrace]

Eshell V6.0  (abort with ^G)
1> ./start.sh: line 2:  9736 Segmentation fault: 11  erl +stbt db -pa ../../ebin -eval "bullet_engine:run()."
make[1]: *** [bullet] Error 139
make: *** [all] Error 2

Yeah then it's definitely to do with GC order. I guess Linux doesn't care about it but OSX does.

I've been looking into this. On Linux I get the renderer destroyed before the window, so no problem. Can you add a printf in both dtor functions of c_src/sdl_renderer.c and c_src/sdl_window.c and tell me in what order they are called? If this is the actual issue then I have a fix in mind. Otherwise, heh, I'll need an OSX to test.

I believe this one should be fixed in the newest commit.

unfortunately, the issue persists on osx 10.9

essen commented

I have an OSX around in a VM now, I will try when I can access it.

essen commented

Well I fail to even compile it. I will get back to you.

essen commented

I can reproduce. I will get you something during the day.

essen commented

My problem might be because SMP was not enabled (trying to add a second core to the VM, but OSX doesn't seem to like that...). Do you have SMP enabled?

essen commented

OK the SMP issue I could fix by adding -smp enable. And now I finally could observe the issue. I'll work on a fix when possible.

Yes, smp is enabled when the issue raised... Thanks for your support. I
would like to ask you some questions about your thoughts around your
project, not sure if this is the best place to discuss it...
El 23/7/2015 1:27 p. m., "Loïc Hoguin" notifications@github.com escribió:

OK the SMP issue I could fix by adding -smp enable. And now I finally
could observe the issue. I'll work on a fix when possible.


Reply to this email directly or view it on GitHub
#2 (comment).

essen commented

Feel free to open a new ticket, tickets are fine for discussions. :-)

Issue persist with esdl2 master brach in OSX... Segfault when demo ends:
./start.sh: line 2: 11970 Segmentation fault: 11 erl -smp enable +stbt db -pa ../../ebin -eval "bullet_engine:run()."
make: *** [bullet_engine] Error 139

essen commented

Yep I confirm this both on Windows and OSX. Fine on Linux.

I'm just trying to get some insights about this issue... but trying to setup a dummy debugging environment for erlang+nifs in OSX/Windows is causing me more than a headache.
Some tip about how to debug this issue?

I could make some progress in OSX side, it seems a bit easier than windows to setup a enabled debug emulator, following the instructions in:
http://www.erlang.org/doc/installation_guide/INSTALL.html
The issue is that launching, for example, "hello_sdl" demo just crash trying to render the texture at the beginning, generating the following stack:

Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0 libsystem_platform.dylib 0x00007fff93f2fd06 _platform_bzero$VARIANT$Merom + 22
1 libsystem_c.dylib 0x00007fff89f91a6b __memset_chk + 22
2 beam.debug.smp 0x0000000015e71fb3 debug_free + 147 (erl_alloc.c:4032)
3 beam.debug.smp 0x0000000015e66001 erts_free + 97 (erl_alloc.h:254)
4 beam.debug.smp 0x0000000016058a7c enif_free + 28 (erl_nif.c:245)
5 esdl2.so 0x0000000016627812 thread_render_copy + 82
6 esdl2.so 0x0000000016623723 nif_thread_handle + 99
7 esdl2.so 0x0000000016623816 nif_main_thread + 70
8 beam.debug.smp 0x000000001608d807 erts_sys_main_thread + 359 (sys.c:3311)
9 beam.debug.smp 0x0000000015e9e7fa erl_start + 14538 (erl_init.c:2166)
10 beam.debug.smp 0x0000000015e06402 main + 34 (erl_main.c:30)
11 libdyld.dylib 0x00007fffa02625ad start + 1

Launching it with standard emulator, crash at the end, as always... but seems to be associated to a missing texture ?¿???¿ ... this is only my assumption looking at the stack because I can't trace anything :(

Thread 15 Crashed:: 2_scheduler
0 libGL.dylib 0x00007fff9d8134bd glDeleteTextures + 18
1 libSDL2-2.0.0.dylib 0x0000000015d4298a GL_DestroyTexture + 54
2 libSDL2-2.0.0.dylib 0x0000000015d39578 SDL_DestroyTexture_REAL + 164
3 esdl2.so 0x00000000141aa55c dtor_Texture + 28
4 beam.smp 0x0000000013cdb392 nif_resource_dtor + 98

Seeing your last try to solve GC order between Renderer and Window using enif_keep_resource, enif_release_resource and the dependency macros for resources you integrated looks good... could it be the same but this time between Renderer and Texture GC order?... I'm only thinking out loud shooting at anything that moves...

essen commented

This would be my guess, yes. Especially considering the second stack trace you just gave.

I don't really know how to get proper debug info other than going through the docs, looks like you are already more equipped than me. Last time I believe it was half guess half printf debugging.

After some more tests, I'm more confused than before... Not sure anymore that it's related to a GC order issue, probably more to some kind of timing/synchronization issue.
I was able to "debug" it in OSX using fprintf(stderr,XXX)... but GC order seems to be ok, because dtor_Renderer isn't called before dtor_Texture and/or crash. Also we have to add that when enabling a debug beam emulator the crash is produced at the start of the demo, inside the rendering loop...
In windows, it's impossible for me to manage printf output redirection properly, but I was able to compile and debug the dll inside Visual Studio 2015, attaching the debugger to the running process... and to get the issue more complex, heres the debug hangs forever when dtor_Window is invoked, not being able to replicate the segmentation fault in debug. Funny thing is that GC order and dtor_## seems to works properly in Windows invoking dtor_surface, dtor_texture, dtor_renderer without issues when closing.
Obviously my limited C skills doesn't help with this.

essen commented

A few tests you can do:

Try creating a window and do nothing else (no loop, no call, just the process exiting).

Do the same with different window options.

Do the same with a renderer.

Do the same with texture.

Add a loop.

Etc.

In windows os, just creating a simple window makes it hangs indefinitely when trying to invoke dtor_Window on exit... it doesn't matter if the dll is cross-compiled with Msys2 or generated with VS2015.

essen commented

Do you have a stacktrace of the crash on Windows? Is it any different from OSX?

Unfortunately I'm not able to generate the segfault anymore... In the beginning the usual behaviour was that launching it (hello_sdl demo) after a clean compilation generated this same freeze behaviour trying to close the window; following execution tries generated the segfault issue.

Excuse me, my fault, I know what is happening with the freeze behaviour... I substituted the SDL2.dll by the dev one in order to debug the issue; this has been causing the freeze issue. With the SDL2.dll release version it generates again the segfault in windows.

I was too quick... the freeze behaviour persists when exiting... It seems that the segfault could be dtor_texture when it's raised... but now, most of the times this doesn't happens and only freeze calling dtor_window... sorry no stacktrace, just erlang vm and window sits there forever until you kill the window. As pointed, same behaviour if you create only a windows and pull sdl events in a loop.

Ok, I have one last theory about what could be happening here... SDL_Destroy## functions are not thread safe calls and SDL multi-threading support varies between OS implementations, taking as dumb rule SDL is not thread safe, full period. It seems they are executed out of the main thread in esdl2 nif, so this would explain why different dtor_## functions shows different behaviors on different OS (OSX dtor_Texture seg faults, meanwhile dtor_window in Windows freezes). Also this would explain the random behavior in Windows, alternating freezes with seg faults?.
How does it sound?... I'll need to learn a bit about C macros and nifs before being able to test it. It's better I sleep a bit.

essen commented

Makes sense.

essen commented

That's why we have a thread in the first place.

I see the same crash, but also it will crash if I click away and the window loses focus.