Loss is NaN in PLATIPUS

Question

Loss is NaN in PLATIPUS

ju-leon opened this issue 3 years ago · 2 comments

Hi,

when trying to run Platipus.py with the provided defaults:
python3 main.py --datasource SineLine --ml-algorithm platipus --num-models 4 --first-order --network-architecture FcNet --no-batchnorm --num-ways 1 --k-shot 5 --inner-lr 0.001 --meta-lr 0.001 --num-epochs 100 --resume-epoch 0 --train

the following error occurs:
Platipus.py, line 195: ValueError: Loss is NaN.

Do you have any idea what might cause this? Doesn't seem to be a config issue since both defaults and altered params don't get any other results.

Thanks!
Leon

Answer 1 · 2021-12-09T01:50:18.000Z

Hi Ju-leon,

Training PLATIPUS is not as straight-forward as other meta-learning methods in the repository, especially for regression since the loss at the beginning of the training might be very large, leading to exploding gradient. That is why the loss is NaN.

As specified at the top of the PLATIPUS script, we properly need to train MAML first, then use MAML as a pre-trained model for PLATIPUS mean parameter.

Another way is to put some gradient clipping for the task adaptation step (a.k.a. inner-loop) and the update for meta-parameter (outer-loop).

Answer 2 · 2021-12-09T11:12:03.000Z

Thanks for the quick reply.

Clipping the loss function seems to work pretty well