ratishsp/data2text-plan-py

After loss.backward(), why needs torch.autograd.backward(inputs, grads, retain_graph=retain_graph)

Closed this issue · 1 comments

Hi Ratish, I have a problem here. After _compute_loss() and loss.div(normalization).backward(), why in function shards , there is torch.autograd.backward(inputs, grads, retain_graph=retain_graph) ? is it usual we use loss to do backward job? why use inputs backward again?

Hi @LarryLee-BD, this repo is based upon a fork of OpenNMT-py. This thread OpenNMT/OpenNMT-py#387 gives the details of the two backward() operations.