abhshkdz/neural-vqa

LookupTable.lua:75: bad argument #3 to 'index' (Tensor | LongTensor expected, got torch.CudaLongTensor)

badripatro opened this issue · 3 comments

Error detail analysis:


$th train.lua -gpuid 0


using CUDA on GPU 0...
Loading data files...
Loading train fc7 features from data/train_fc7.t7
Loading val fc7 features from data/val_fc7.t7
Parameters: 6813673
Batches: 1076
Max iterations: 53800
/home/cse/torch/install/bin/luajit:
/home/cse/torch/install/share/lua/5.1/nn/Container.lua:67:
In 1 module of nn.Sequential:
/home/cse/torch/install/share/lua/5.1/nn/LookupTable.lua:75: bad argument
#3 to 'index' (Tensor | LongTensor expected, got torch.CudaLongTensor)

stack traceback:
[C]: in function 'index'
/home/cse/torch/install/share/lua/5.1/nn/LookupTable.lua:75: in
function
[C]: in function 'xpcall'
/home/cse/torch/install/share/lua/5.1/nn/Container.lua:63: in function
'rethrowErrors'
/home/cse/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function
'forward'
train.lua:375: in function 'opfunc'
/home/cse/torch/install/share/lua/5.1/optim/adam.lua:37: in function
'adam'
train.lua:476: in main chunk
[C]: in function 'dofile'
.../cse/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in
main chunk
[C]: at 0x00405970

WARNING: If you see a stack trace below, it doesn't point to the place
where this error occurred. Please use only the one above.
stack traceback:
[C]: in function 'error'
/home/cse/torch/install/share/lua/5.1/nn/Container.lua:67: in function
'rethrowErrors'
/home/cse/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function
'forward'
train.lua:375: in function 'opfunc'
/home/cse/torch/install/share/lua/5.1/optim/adam.lua:37: in function
'adam'
train.lua:476: in main chunk
[C]: in function 'dofile'
.../cse/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in
main chunk
[C]: at 0x00405970


Reffered : I have searched this problem and found following link :


https://groups.google.com/forum/#!topic/torch7/6igs4XdoSW4
jcjohnson/neural-style#210
jcjohnson/torch-rnn#56
https://groups.google.com/forum/#!topic/torch7/uEeBWwkyDPU


Solution : After ,I referred those link, i have re-installed following packages


luarocks install nn
luarocks install cunn
luarocks install cunn 1.0-0
luarocks install torch
luarocks install cutorch
luarocks install cutorch 1.0-0


But, I got same error. Please let me know how do i solve this issue.

Did you transfer the model to GPU (model = model:cuda())?
What are lines 375 and 476 in your train.lua?

  1. -- pass over the model to gpu
if opt.gpuid >= 0 then
        protos.ltw = protos.ltw:cuda()
        protos.lti = protos.lti:cuda()
        protos.lstm = protos.lstm:cuda()
        protos.sm = protos.sm:cuda()
        protos.criterion = protos.criterion:cuda()
     end

  1. line 375:
    imf = protos.lti:forward(i_batch)

    line 476:
    _, local_loss = optim.adam(feval, params, optim_state)


3 . This is part of your code and marked bold as line no 375 and 476


-- closure to run a forward and backward pass and return loss and gradient parameters
feval = function(x)
-- get latest parameters
if x ~= params then
params:copy(x)
end
grad_params:zero()
-- load question batch, answer batch and image features batch
q_batch, a_batch, i_batch = loader:next_batch()

-- slightly hackish; 1st index of `nn.LookupTable` is reserved for zeros
q_batch = q_batch + 1

-- forward the question features through ltw
qf = protos.ltw:forward(q_batch)

-- forward the image features through lti . This is line : 375
imf = protos.lti:forward(i_batch)

-- convert to CudaTensor if using gpu
if opt.gpuid >= 0 then
    imf = imf:cuda()
end

------------ forward pass ------------

-- set initial loss
loss = 0

-- set the state at 0th time step of LSTM
rnn_state = {[0] = init_state_global}

-- LSTM forward pass for question features
for t = 1, loader.q_max_length do
    lst = lstm_clones[t]:forward{qf:select(2,t), unpack(rnn_state[t-1])}
    -- at every time step, set the rnn state (h_t, c_t) to be passed as input in next time step
    rnn_state[t] = {}
    for i = 1, #init_state do table.insert(rnn_state[t], lst[i]) end
end

-- after completing the unrolled LSTM forward pass with question features, forward the image features
lst = lstm_clones[loader.q_max_length + 1]:forward{imf, unpack(rnn_state[loader.q_max_length])}

-- forward the hidden state at the last time step to get softmax over answers
prediction = protos.sm:forward(lst[#lst])

-- calculate loss
loss = protos.criterion:forward(prediction, a_batch)

------------ backward pass ------------

-- backprop through loss and softmax
dloss = protos.criterion:backward(prediction, a_batch)
doutput_t = protos.sm:backward(lst[#lst], dloss)

-- set internal state of LSTM (starting from last time step)
drnn_state = {[loader.q_max_length + 1] = utils.clone_list(init_state, true)}
drnn_state[loader.q_max_length + 1][opt.num_layers * 2] = doutput_t

-- backprop for last time step (image features)
dlst = lstm_clones[loader.q_max_length + 1]:backward({imf, unpack(rnn_state[loader.q_max_length])}, drnn_state[loader.q_max_length + 1])

-- backprop into image linear layer
protos.lti:backward(i_batch, dlst[1])

-- set LSTM state
drnn_state[loader.q_max_length] = {}
for i,v in pairs(dlst) do
    if i > 1 then
        drnn_state[loader.q_max_length][i-1] = v
    end
end

dqf = torch.Tensor(qf:size()):zero()
if opt.gpuid >= 0 then
    dqf = dqf:cuda()
end

-- backprop into the LSTM for rest of the time steps
for t = loader.q_max_length, 1, -1 do
    dlst = lstm_clones[t]:backward({qf:select(2, t), unpack(rnn_state[t-1])}, drnn_state[t])
    dqf:select(2, t):copy(dlst[1])
    drnn_state[t-1] = {}
    for i,v in pairs(dlst) do
        if i > 1 then
            drnn_state[t-1][i-1] = v
        end
    end
end

-- zero gradient buffers of lookup table, backprop into it and update parameters
protos.ltw:zeroGradParameters()
protos.ltw:backward(q_batch, dqf)
protos.ltw:updateParameters(opt.learning_rate)

-- clip gradient element-wise
grad_params:clamp(-5, 5)

return loss, grad_params

end

-- optim state with ADAM parameters
local optim_state = {learningRate = opt.learning_rate, alpha = opt.alpha, beta = opt.beta, epsilon = opt.epsilon}

-- train / val loop!
losses = {}
iterations = opt.max_epochs * loader.batch_data.train.nbatches
print('Max iterations: ' .. iterations)
lloss = 0

for i = 1, iterations do
-- This is line 476
_, local_loss = optim.adam(feval, params, optim_state) @

losses[#losses + 1] = local_loss[1]

lloss = lloss + local_loss[1]
local epoch = i / loader.batch_data.train.nbatches

if i%10 == 0 then
    print('epoch ' .. epoch .. ' loss ' .. lloss / 10)
    lloss = 0
end

-- Decay learning rate occasionally
if i % loader.batch_data.train.nbatches == 0 and opt.learning_rate_decay < 1 then
    if epoch >= opt.learning_rate_decay_after then
        local decay_factor = opt.learning_rate_decay
        optim_state.learningRate = optim_state.learningRate * decay_factor -- decay it
        print('decayed learning rate by a factor ' .. decay_factor .. ' to ' .. optim_state.learningRate)
    end
end

-- Calculate validation accuracy and save model snapshot
if i % opt.save_every == 0 or i == iterations then
    print('Checkpointing. Calculating validation accuracy..')
    local val_acc = feval_val()
    local savefile = string.format('%s/%s_epoch%.2f_%.4f.t7', opt.checkpoint_dir, opt.savefile, epoch, val_acc)
    print('Saving checkpoint to ' .. savefile)
    local checkpoint = {}
    checkpoint.opt = opt
    checkpoint.protos = protos
    checkpoint.vocab_size = loader.q_vocab_size
    torch.save(savefile, checkpoint)
end

if i%10 == 0 then
    collectgarbage()
end

end

@badripatro Install in this order:

luarocks install torch
luarocks install nn
luarocks install cunn
luarocks install cutorch