Unstable training loss

Question

Unstable training loss

Opened this issue a year ago · 0 comments

Hi,
I tried to reproduce the model, however, my training loss is very unstable:

For preprocessing:

I downloaded the dataset from the BBBC021 official website, then used the two CellProfiler pipelines provided to produce the training data, where I preprocessed annotated data with DMSO data together. (Should I do the preprocessing for annotated and DMSO separately?)

I trained the model with the same parameters on 2x Tesla V100 using "python -m torch.distributed.launch --nproc_per_node=2 WS-DINO_BBBC021.py"

Could anyone point out where I did it wrong?

Many thanks