train your reward model issue
wac81 opened this issue · 1 comments
wac81 commented
can't train reward model with batch
seq, prompt_mask, labels = next(train_loader)
loss = reward_model(seq, prompt_mask = prompt_mask, labels = labels)
accelerator.backward(loss / GRADIENT_ACCUMULATE_EVERY)
i set this but i get error from code, check source code , found out this:
if self.binned_output:
return F.mse_loss(pred, labels)
return F.cross_entropy(pred, labels)
cross_entropy DO NOT support multi trainset. i change to mse_loss ,still error.
how i compute loss from multi trainset , like batch size set 8 ,
wac81 commented
reward model doesn't need training.
Are you serious?
how to explain README example?