will token number becom larger when fix threshold (hard training step)?
DreamsofGg opened this issue · 0 comments
DreamsofGg commented
it seems that the model will tend to make the token number larger when fix threshold (hard training step) because it cannot take L1 loss into account. How to solve this problem?