tristandeleu/pytorch-maml-rl

Fails to converge on bandit tasks

Opened this issue · 1 comments

Using k=5, n=100, MAML fails to learn: average training and validation returns consistently hover around 50 throughout all 500 outer loop steps. Any possible discrepancies between this repo's code/config and the paper's experiments?

For reference, the following command

python train.py --config configs/maml/bandit/bandit-k5-n100.yaml --output-folder maml-bandit-k5-n100 --seed 1 --num-workers 10

produces the following average training/validation average returns for first and last 5 iterations respectively:

0 49.1 51.600002
1 45.5 47.75
2 49.449997 50.350002
3 49.65 52.2
4 50.4 52.7
...
495 46.6 50.0
496 50.150005 50.200005
497 53.100002 55.15
498 49.0 50.450005
499 44.5 47.600002

The code did change significantly between the version we used in the paper and the current version (the paper was written on a very early version of the code, which probably got lost in the many refactoring we did even prior to open-sourcing the code). I haven't run bandit experiments since then on the new code unfortunately, I added the config files a few months ago after some request, but I haven't tried it myself. Unfortunately I don't know if the results should still hold with this version (I thought they would).