how to acquire the real whole batch sequenece training loss(reduction_mode=mean) ?
Opened this issue · 2 comments
littttttlebird commented
in the train.py, the loss return from main process is the loss of one sequence block, not the whole sequence loss.
littttttlebird commented
3Q very much