hughperkins/distro-cl

internal error in __sub: no metatable

limadm opened this issue · 0 comments

Hello!

I was trying to run pix2pix with torch-cl, but I found a subtle bug when using nngraph.
nngraph helps to build a complex neural network graph, overriding the nn.Module.__unm metamethod to convert nn operations to graph nodes, and the nn.Module.__sub/graph.Node.__sub metamethods as syntactic sugar to link the graph nodes, for example we can define this dummy two-layer graph:

input1, input2 = -nn.Identity(), -nn.Identity()    -- creates identity input nodes
output = {input1, input2} - nn.JoinTable(2)        -- joins the two inputs in a single tensor

So we end with this:

>-- input1 --\
              >-- output -->
>-- input2 --/

It works fine with torch, but torch-clgives:

~/torch-cl/install/bin/luajit: ./models.lua:69: internal error in __sub: no metatable
stack traceback:
[C]: in function '__sub'
./models.lua:69: in function 'defineG_unet'
train.lua:110: in function 'defineG'
train.lua:146: in main chunk
[C]: in function 'dofile'
...i/torch-cl/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x010e607ce0

I found this error message in pkg/torch/lib/luaT/luaT.c, where the metamethods are defined with MT_DECLARE_OPERATOR.
torch uses two macros, one for unary metamethods (MT_DECLARE_OPERATOR) and the other for binary metamethods (MT_DECLARE_BIN_OPERATOR), so when checking a binary metamethod it looks for the metatables of both parameters.
In the above example, even though {input1,input2} has no metatable, MT_DECLARE_BIN_OPERATOR finds one in -nn.JoinTable(2) and makes the call.

torch-cl's luaT.c uses just the unary version, so the {x,x} - nn.Module() will only look for the metatable of the first parameter (the bare table {x,x}) and fail with the internal error in __sub: no metatable.

I think that this could be solved by copying the torch approach, use separate macros for unary and binary metamethods. I can do this change if needed.

Thanks!