torch/trepl

THArgCheck doesn't print error message from interpreter

nicholas-leonard opened this issue · 12 comments

To reproduce bug:

$> th
a = torch.randn(3,4,1)
b = torch.randn(2,4,1)
i = torch.randperm(1):long()
a:indexCopy(1, i, b)

Run the same thing in a lua scritp to get a nice error message.

Makes it impossible to debug using the th shell.

that looks quite bad!

it prints ok = false, but it never hits traceback. (only in trepl, in luajit it goes inside traceback)

function traceback(message)
   print('inside traceback')
   return message
end

a = torch.randn(3,4,1)
b = torch.randn(2,4,1)
i = torch.randperm(1):long()

line = 'a:indexCopy(1, i, b)'
func, perr = loadstring('local f = function() return '..line..' end local res = {f()} print(unpack(res))')

ok,err = xpcall(func, traceback)
print(ok)
print(err)

I looked at debugging this further, but I dont get why it doesn't go into the traceback, it clearly returns ok=false.
I hope it isn't some scary bug wrt coroutine resuming and xpcall, combined with going into C frames.

@clementfarabet clues thoughts..?

On the first bug, it's just a question of visual feedback: the last line doesn't have a "\n", so it doesn't get executed. If you hit return after, it'll execute it and you'll get the error. The same is true in lua / luajit, except that they pre-pend a ">" before each line, so you clearly see that the last line didn't get executed.

On the second one, are you using readline (I'm not and I get the same behavior, so it has to be a side-effect of the readline wrapper)?

The line parser is quite complex and clearly not mature. There's quite a bit of work required to get it to a stable place.

On the first bug, it's just a question of visual feedback: the last line doesn't have a "\n", so it doesn't get executed. If you hit return after, it'll execute it and you'll get the error.

No, you dont get the error on enter. It prints out some stuff on the stack (basically not going into the traceback), which is the bug:

th> a:indexCopy(1, i, b) [enter]
(1,.,.) =
 -1.2264
  1.1366
 -1.0009
 -0.4218

(2,.,.) =
 -0.4207
 -0.1404
  1.2361
  0.5570
[torch.DoubleTensor of dimension 2x4x1]

yes, i'm using readline. hmm, maybe it isn't the same with linenoise, is that what you're using?

Ok so both issues are in the readline wrapper then.

I'm not using this code anymore, I've dropped luajit so I don't have access to these ffi bindings anymore.

ok a little more digging around:
(1) It applies to all error messages at the C level. The traceback is never entered for xpcall in this case.
(2) It happens only across coroutine resumes. To compare, this behavior does not happen with fbtrepl which is almost exactly trepl, but does not use coroutines (and uses editline, but using same ffi bindings).

I'll try to get rid of coroutines altogether and revamp the readline bindings, this is indeed a pretty serious bug, if you cant tell whether an error occured or not.

Not surprised to see the culprit is coroutines. I built an iterator using coroutines only to find it hell to debug. Returns bogus error messages (tensors and scalars). It having to do with the C error messages makes sense.

Should all be fixed in this new PR.

fixed via #12

thanks guys. Works great.