Loss is NaN in PLATIPUS
ju-leon opened this issue · 2 comments
Hi,
when trying to run Platipus.py with the provided defaults:
python3 main.py --datasource SineLine --ml-algorithm platipus --num-models 4 --first-order --network-architecture FcNet --no-batchnorm --num-ways 1 --k-shot 5 --inner-lr 0.001 --meta-lr 0.001 --num-epochs 100 --resume-epoch 0 --train
the following error occurs:
Platipus.py, line 195: ValueError: Loss is NaN.
Do you have any idea what might cause this? Doesn't seem to be a config issue since both defaults and altered params don't get any other results.
Thanks!
Leon
Hi Ju-leon,
Training PLATIPUS is not as straight-forward as other meta-learning methods in the repository, especially for regression since the loss at the beginning of the training might be very large, leading to exploding gradient. That is why the loss is NaN.
As specified at the top of the PLATIPUS script, we properly need to train MAML first, then use MAML as a pre-trained model for PLATIPUS mean parameter.
Another way is to put some gradient clipping for the task adaptation step (a.k.a. inner-loop) and the update for meta-parameter (outer-loop).
Thanks for the quick reply.
Clipping the loss function seems to work pretty well