THArgCheck doesn't print error message from interpreter

Question

THArgCheck doesn't print error message from interpreter

nicholas-leonard opened this issue 10 years ago · 12 comments

To reproduce bug:

$> th
a = torch.randn(3,4,1)
b = torch.randn(2,4,1)
i = torch.randperm(1):long()
a:indexCopy(1, i, b)

Run the same thing in a lua scritp to get a nice error message.

Makes it impossible to debug using the th shell.

Answer 1 · 2014-11-25T06:25:20.000Z

that looks quite bad!

Answer 2 · 2014-12-07T08:05:58.000Z

it prints ok = false, but it never hits traceback. (only in trepl, in luajit it goes inside traceback)

function traceback(message)
   print('inside traceback')
   return message
end

a = torch.randn(3,4,1)
b = torch.randn(2,4,1)
i = torch.randperm(1):long()

line = 'a:indexCopy(1, i, b)'
func, perr = loadstring('local f = function() return '..line..' end local res = {f()} print(unpack(res))')

ok,err = xpcall(func, traceback)
print(ok)
print(err)

Answer 3 · 2014-12-07T08:29:46.000Z

I looked at debugging this further, but I dont get why it doesn't go into the traceback, it clearly returns ok=false.
I hope it isn't some scary bug wrt coroutine resuming and xpcall, combined with going into C frames.

@clementfarabet clues thoughts..?

Answer 4 · 2014-12-07T15:47:05.000Z

On the first bug, it's just a question of visual feedback: the last line doesn't have a "\n", so it doesn't get executed. If you hit return after, it'll execute it and you'll get the error. The same is true in lua / luajit, except that they pre-pend a ">" before each line, so you clearly see that the last line didn't get executed.

On the second one, are you using readline (I'm not and I get the same behavior, so it has to be a side-effect of the readline wrapper)?

The line parser is quite complex and clearly not mature. There's quite a bit of work required to get it to a stable place.

Answer 5 · 2014-12-07T16:22:07.000Z

On the first bug, it's just a question of visual feedback: the last line doesn't have a "\n", so it doesn't get executed. If you hit return after, it'll execute it and you'll get the error.

No, you dont get the error on enter. It prints out some stuff on the stack (basically not going into the traceback), which is the bug:

th> a:indexCopy(1, i, b) [enter]
(1,.,.) =
 -1.2264
  1.1366
 -1.0009
 -0.4218

(2,.,.) =
 -0.4207
 -0.1404
  1.2361
  0.5570
[torch.DoubleTensor of dimension 2x4x1]

Answer 6 · 2014-12-07T16:22:48.000Z

yes, i'm using readline. hmm, maybe it isn't the same with linenoise, is that what you're using?

Answer 7 · 2014-12-07T16:32:18.000Z

Ok so both issues are in the readline wrapper then.

I'm not using this code anymore, I've dropped luajit so I don't have access to these ffi bindings anymore.

Answer 8 · 2014-12-07T17:37:33.000Z

ok a little more digging around:
(1) It applies to all error messages at the C level. The traceback is never entered for xpcall in this case.
(2) It happens only across coroutine resumes. To compare, this behavior does not happen with fbtrepl which is almost exactly trepl, but does not use coroutines (and uses editline, but using same ffi bindings).

I'll try to get rid of coroutines altogether and revamp the readline bindings, this is indeed a pretty serious bug, if you cant tell whether an error occured or not.

Answer 9 · 2014-12-08T16:03:23.000Z

Not surprised to see the culprit is coroutines. I built an iterator using coroutines only to find it hell to debug. Returns bogus error messages (tensors and scalars). It having to do with the C error messages makes sense.

Answer 10 · 2014-12-25T19:52:01.000Z

Should all be fixed in this new PR.

Answer 11 · 2015-01-01T00:04:10.000Z

fixed via #12

Answer 12 · 2015-01-02T20:06:31.000Z

thanks guys. Works great.