will token number becom larger when fix threshold (hard training step)?

Question

will token number becom larger when fix threshold (hard training step)?

DreamsofGg opened this issue 3 years ago · 0 comments

it seems that the model will tend to make the token number larger when fix threshold (hard training step) because it cannot take L1 loss into account. How to solve this problem?