Mistake in the focal loss implementation?

Hi, thank you very much for releasing this codebase -- it has been very useful to my project.

I'm wondering if there is a mistake in the Focal Loss implementation. The code in loss/focal.py first calculates CrossEntropy loss, averages it over all samples, and then applies a modulating factor loss = (1 - p) ** self.gamma * logp. If I understand the original Focal Loss paper correctly, they propose to calculate CrossEntropy, apply the modulating factor, and only then average the result over all samples.
I wonder, is the order changed on purpose in this repository? I think this way it might be losing the idea of Focal loss...

If it's actually a mistake, a simple fix in the line

face.evoLVe/loss/focal.py

Line 13 in 6352092

self.ce = nn.CrossEntropyLoss()

to self.ce = nn.CrossEntropyLoss(reduction='none') will suffice.

Other implementations also seem to be having a different order, e.g. see 1 and 2.