?fix: incredibly low loss after 4 epochs
Closed this issue · 4 comments
Hypotheses:
- Padding tokens are used for generating loss
- Extremely homogenous sequences – they are pretty homogenous, but not entirely. Seems like a relatively likely explanation.
@KennethEnevoldsen, any thoughts here? :-)
Padding tokens are used for generating loss
You can check, but I believe the loss ignores the padding token id (-1 I believe). It at leasts shouldn't be the case.
Extremely homogenous sequences – they are pretty homogenous, but not entirely. Seems like a relatively likely explanation.
Def. a more likely explanation.
You can check, but I believe the loss ignores the padding token id (-1 I believe). It at leasts shouldn't be the case.
We've tried looking at the test_train
test. When running it, and setting a breakpoint within the PretrainerBEHRT.forward()
scope, we get:
- The
input["is_padding"]
tensor has values[0,0,0,1,1]
We then tried looking at the logits here:
But none of them are "-1". Does this imply that it does not ignore padding when calculating logits?
I'm way out of my depth here, so it might make sense for you to be hands on?
You should look in labels (in self.loss(logics, labels)
)
It seems perfectly fine here:
Might be worth setting it as a class attribute to make sure that one is not changed without the other.
Ah, great! I'll change it to an attribute. I assume it's the redundancy making the pretraining very easy, then.
Lasse is using the GPU for a while, but after that I'll run a finetuning 👍