Error in fine tuning with 2 gpus
ankitag9 opened this issue · 2 comments
Hi,
I have a pretrained model and I am trying to fine tune it using 2 gpus. I am getting the following error -
/home/strange/torch/install/bin/luajit: /home/strange/torch/install/share/lua/5.1/nn/Container.lua:67:
In 1 module of nn.Sequential:
/home/strange/torch/install/share/lua/5.1/nn/Linear.lua:67: Assertion `THCTensor_(checkGPU)(state, 4, r_, t, vec1, vec2)' failed. at /home/strange/torch/extra/cutorch/lib/THC/generic/THCTensorMathBlas.cu:138
stack traceback:
[C]: in function 'addr'
/home/strange/torch/install/share/lua/5.1/nn/Linear.lua:67: in function </home/strange/torch/install/share/lua/5.1/nn/Linear.lua:53>
[C]: in function 'xpcall'
/home/strange/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
/home/strange/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
train.lua:499: in function 'train_batch'
train.lua:752: in function 'train'
train.lua:1080: in function 'main'
train.lua:1083: in main chunk
[C]: in function 'dofile'
...ange/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00405d50
WARNING: If you see a stack trace below, it doesn't point to the place where this error occurred. Please use only the one above.
stack traceback:
[C]: in function 'error'
/home/strange/torch/install/share/lua/5.1/nn/Container.lua:67: in function 'rethrowErrors'
/home/strange/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
train.lua:499: in function 'train_batch'
train.lua:752: in function 'train'
train.lua:1080: in function 'main'
train.lua:1083: in main chunk
[C]: in function 'dofile'
...ange/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00405d50
What can be done to remove this error?
try first converting the model to cpu using convert_to_cpu.lua
It works.. Thanks!!