Aarhus-Psychiatry-Research/psycop-common

?fix: incredibly low loss after 4 epochs

Closed this issue · 4 comments

Padding tokens are used for generating loss

You can check, but I believe the loss ignores the padding token id (-1 I believe). It at leasts shouldn't be the case.

Extremely homogenous sequences – they are pretty homogenous, but not entirely. Seems like a relatively likely explanation.

Def. a more likely explanation.

You can check, but I believe the loss ignores the padding token id (-1 I believe). It at leasts shouldn't be the case.

We've tried looking at the test_train test. When running it, and setting a breakpoint within the PretrainerBEHRT.forward() scope, we get:

  • The input["is_padding"] tensor has values [0,0,0,1,1]

We then tried looking at the logits here:

logits = self.mlm_head(encoded_patients)

But none of them are "-1". Does this imply that it does not ignore padding when calculating logits?

I'm way out of my depth here, so it might make sense for you to be hands on?

You should look in labels (in self.loss(logics, labels))

It seems perfectly fine here:

masked_labels[~mask] = -1 # -1 will be ignored in loss function

self.loss = nn.CrossEntropyLoss(ignore_index=-1)

Might be worth setting it as a class attribute to make sure that one is not changed without the other.

Ah, great! I'll change it to an attribute. I assume it's the redundancy making the pretraining very easy, then.

Lasse is using the GPU for a while, but after that I'll run a finetuning 👍