Help with using autograd in training with wrapped NN modules
synchro-- opened this issue · 4 comments
Let's say I have a whole wrapped network made with nn called 'model' and I used the
modelFunction, params = autograd.functionalize(model)
neuralNet = function(params, input, target) ... return myCustomLoss end
df = autograd(neuralNet)
Now I want to train my model. Since I have my training typical procedure (with the mini-batch closure) already written and ready, I would like to keep most of it, only exploiting the very easy way to get the gradients with autograd.
So let's confront the 2 methods, so that you can tell if that is actually possible.
The usual:
local feval = function(x)
if x ~= parameters then
parameters:copy(x)
end
-- reset gradients
gradParameters:zero()
-- f is the average of all criterions
local f = 0
-- evaluate function for complete mini batch
for i = 1,#inputs do
-- estimate f
local output = model:forward(inputs[i])
local err = criterion:forward(output, targets[i])
f = f + err
-- estimate df/dW
local df_do = criterion:backward(output, targets[i])
model:backward(inputs[i], df_do)
end
-- normalize gradients and f(X)
gradParameters:div(#inputs)
f = f/#inputs
-- return f and df/dX
return f,gradParameters
end
So, using autograd while making the smallest changes possible it would be:
-- create closure to evaluate f(X) and df/dX
local feval = function(x)
-- get new parameters
if x ~= parameters then
parameters:copy(x)
end
-- reset gradients
gradParameters:zero()
-- f is the average of all criterions
local f = 0
-- evaluate function for complete mini batch
for i = 1,#inputs do
-- estimate f
local df_do, err, output = df(params,inputs[i],targets[i])
f = f + err
model:backward(inputs[i], df_do)
end
-- normalize gradients and f(X)
gradParameters:div(#inputs)
f = f/#inputs
-- return f and df/dX
return f,gradParameters
end
And then going on with using the optim module in the classical way.
Is this not possible/not suggested?
@synchro-- were you successful in doing this? I am mixing optim
with wrapped nn
modules and getting the following errors:
/Graph.lua:40: bad argument #2 to 'fn' (expecting number or torch.DoubleTensor or torch.DoubleStorage at /tmp/luarocks_torch-scm-1-9261/torch7/generic/Tensor.c:1125)
I can confirm that it is possible to mix optim
with wrapped nn
modules. Errors that you hit might be features that autograd
does not support.
@biggerlambda Thanks. Do you have any example of that?