JakobOvrum/LuaD

unittests crash on exit with dmd 2.067 and dmd 2.068

John-Colvin opened this issue · 7 comments

From 2.067 onwards, structs on the GC heap (with exception of in AAs until 2.068.0) will have their destructors called before being freed/collected. I suspect that this change is the cause of these segfaults.

Here's a backtrace:

(gdb) bt full
#0  0x00000000005505f9 in luaH_getnum ()
No symbol table info available.
#1  0x000000000054759d in lua_rawgeti ()
No symbol table info available.
#2  0x0000000000554e72 in luaL_unref ()
No symbol table info available.
#3  0x00000000004d40a0 in luad.base.LuaObject.~this() (this=...) at luad/base.d:92
No locals.
#4  0x0000000000582113 in rt.lifetime.finalize_array2(void*, ulong) ()
No symbol table info available.
#5  0x000000000057b51d in rt_finalizeFromGC ()
No symbol table info available.
#6  0x0000000000577d9b in gc.gc.Gcx.sweep() ()
No symbol table info available.
#7  0x00000000005784a2 in gc.gc.Gcx.fullcollect(bool) ()
No symbol table info available.
#8  0x0000000000581aa1 in gc_term ()
No symbol table info available.
#9  0x000000000057ac76 in rt_term ()
No symbol table info available.
#10 0x0000000000565e25 in rt.dmain2._d_run_main(int, char**, extern(C) int(char[][]) function*).runAll() ()
No symbol table info available.
#11 0x0000000000565db6 in rt.dmain2._d_run_main(int, char**, extern(C) int(char[][]) function*).tryExec(scope void() delegate) ()
No symbol table info available.
#12 0x0000000000565d36 in _d_run_main ()
No symbol table info available.
#13 0x00000000004d3ed0 in main ()
No symbol table info available.
#14 0x000076d04c03c610 in __libc_start_main () from /usr/lib/libc.so.6
No symbol table info available.
#15 0x00000000004d3329 in _start ()
No symbol table info available.

A cursory look at the backtrace seems to suggest that there's a GC-allocated array containing LuaObject somewhere inside it (the other high level types are composed with LuaObject). The array gets finalized after a GC-allocated (probably straight-up new'd) LuaState instance.

I've thought about this issue before but it wasn't critical until recently. Maybe a solution would be to change LuaObject to use a custom reference counting mechanism that supports the necessary logic to ensure the underlying lua_State* is only closed when both the LuaState owner and all LuaObject references are gone.

I assume a LuaObject is guaranteed to be associated with only one LuaState, yes?

Just off the top of my head: if LuaState can know about all its LuaObject's then it can scan through them all in its destructor and manually call their destructors where necessary (being careful not to use the GC inside a destructor that is itself called by the GC). It will be relatively slow, but it might not matter that much seeing as LuaState destruction should be pretty rare and outside any hot paths.

This may be the same issue, or it might not be.

But everything works for me except writing a D module for use in Lua. That segfaults. Tried it on two different machines both running arch linux 64 bit.

There's also another question about scope for building a related project to work with LuaJIT, which is used by Facebook's iTorch and can be used interactively from the Jupyter/ipython notebook. I am not sure, but it seems to me the sensible approach with LuaJit is to make use of it's FFi interface - you can just specify C headers as a string. It can deal with C structs and C 'arrays' natively and allows you to create metatables for C functions and data structures.

So starting from base zero, what I am inclined to do there is use Adam Ruppe's dtoh to generate the C headers. But one would still need to wrap D methods and arrays (turning a D array into a ptr/length pair and adding some sugar) on both the D and Lua side.

I guess you don't have time or interest to work on this together, but I thought I would mention my idea here out of courtesy. I may post something on the forum when time.

Laeeth.

But everything works for me except writing a D module for use in Lua. That segfaults. Tried it on two different machines both running arch linux 64 bit.

Assuming you are building a shared library; are you initializing the D runtime? A segfault report isn't very useful without at least a stack trace.

There's also another question about scope for building a related project to work with LuaJIT, which is used by Facebook's iTorch and can be used interactively from the Jupyter/ipython notebook. I am not sure, but it seems to me the sensible approach with LuaJit is to make use of it's FFi interface - you can just specify C headers as a string. It can deal with C structs and C 'arrays' natively and allows you to create metatables for C functions and data structures.

You can already use LuaJIT simply by linking with it. Completing the higher level interface built on the standard Lua C API is a priority over optimizing for LuaJIT's non-standard interfaces.

Apropos this issue, I have been working on a patch which turned out to be fairly involved but it shouldn't be a problem in theory. I just need to plow through various bugs.

Hi - yes I have tried initializing the D runtime, both via the
Runtime.initialize and the C version. I'll try to get a stack trace
for you in coming days.
The FFI interface for LuaJit may be different from the main Lua
interface, but it's hard to see what can be less standard about the
interface itself. (One obviously needs to wrap it on the Lua side with
C metatable entries, and maybe that's what you mean). I wasn't
suggesting you do that, but I might find it necessary to do for my
work, and just wanted to let you know in case it was helpful in some
way. Will take some time before I can start on it though, and someone
else might be helping me.
So I am using iTorch which has LuaJit built in to torch. Do I need to
change that build at all, or will it be enough just to change the
compilation for LuaD to link with libluajit (or whatever its called)
rather than liblua?
Thanks.
Laeeth.
On Thu, 2015-09-17 at 13:40 -0700, JakobOvrum wrote:

But everything works for me except writing a D module for use in Lua.
That segfaults. Tried it on two different machines both running arch
linux 64 bit.
Assuming you are building a shared library; are you initializing the
D runtime? A segfault report isn't very useful without at least a
stack trace.
There's also another question about scope for building a related
project to work with LuaJIT, which is used by Facebook's iTorch and
can be used interactively from the Jupyter/ipython notebook. I am not
sure, but it seems to me the sensible approach with LuaJit is to make
use of it's FFi interface - you can just specify C headers as a
string. It can deal with C structs and C 'arrays' natively and allows
you to create metatables for C functions and data structures.
You can already use LuaJIT simply by linking with it. Completing the
higher level interface built on the standard Lua C API is a priority
over optimizing for LuaJIT's non-standard interfaces.
Apropos this issue, I have been working on a patch which turned out
to be fairly involved but it shouldn't be a problem in theory. I just
need to plow through various bugs.

Reply to this email directly or view it on GitHub.

So I am using iTorch which has LuaJit built in to torch. Do I need to change that build at all, or will it be enough just to change the compilation for LuaD to link with libluajit (or whatever its called) rather than liblua?

Yes, that will be enough. LuaJIT is designed to work as a drop-in replacement.

Note that the Lua version supported by LuaD is 5.1.x.

stack trace:


|_ | | |
| | ___ _ __ | |
| |/ _ | '**/ _| ' \
| | (
) | | | (| | | |
/
/|
| _**|| ||

JIT: ON SSE2 SSE3 SSE4.1 fold cse dce fwd dse narrow loop abc sink fuse
th> require 'module'

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff747aac4 in pthread_mutex_lock () from /usr/lib/libpthread.so.0
(gdb) backtrace
#0 0x00007ffff747aac4 in pthread_mutex_lock () from /usr/lib/libpthread.so.0
#1 0x00007ffff64a5aba in core.sync.mutex.Mutex.lock_nothrow() () from /usr/lib/libphobos2.so.0.68
#2 0x00007ffff64a8dcd in gc.gc.GCMutex.lock() () from /usr/lib/libphobos2.so.0.68
#3 0x00007ffff64a95ab in gc.gc.GC.malloc(ulong, uint, ulong_, const(TypeInfo)) () from /usr/lib/libphobos2.so.0.68
#4 0x00007ffff64afc21 in gc_malloc () from /usr/lib/libphobos2.so.0.68
#5 0x00007ffff64c1a13 in d_newclass () from /usr/lib/libphobos2.so.0.68
#6 0x00007ffff6a4b6dc in luad.lmodule.openDModule!(luad.table.LuaTable(luad.state.LuaState) function
).openDModule(luad.c.lua.lua_State_, luad.table.LuaTable(luad.state.LuaState) function_) (L=0x40000378, initFunc=0x7ffff6a49b50 <dmodule.initModule(luad.state.LuaState)>) at ../../luad/lmodule.d:14
#7 0x00007ffff6a4a42c in luaopen_dmodule (L=0x40000378) at dmodule.d-mixin-34:34
#8 0x000000000047512a in lj_BC_FUNCC ()
#9 0x0000000000462c8a in lj_cf_package_require ()
#10 0x000000000047512a in lj_BC_FUNCC ()
#11 0x0000000000462c8a in lj_cf_package_require ()
#12 0x000000000047512a in lj_BC_FUNCC ()
#13 0x000000000046377d in lua_pcall ()
#14 0x000000000040585c in dotty ()
#15 0x0000000000406440 in pmain ()
#16 0x000000000047512a in lj_BC_FUNCC ()
#17 0x00000000004637f7 in lua_cpcall ()
#18 0x0000000000404434 in main ()