tonylitianyu/Preference-Planning-Deep-IRL

What is the meaning of 1000 in the final return of the loss function?

Closed this issue · 2 comments

def maxentirl_loss(learner, expert, reward_func, device):
learner_torch = torch.FloatTensor(learner).to(device)
expert_torch = torch.FloatTensor(expert).to(device)
learner_r = reward_func.r(learner_torch).view(-1)
expert_r = reward_func.r(expert_torch).view(-1)
return 1000 * (learner_r.mean() - expert_r.mean())

I was testing scaling up the loss to see whether it will speed up the converging process or not. It did not change too much and I forgot to delete it. Sorry about the confusion.

Thank you very much for your reply, this project is very helpful to me.