Fails to converge on bandit tasks

Question

Fails to converge on bandit tasks

Opened this issue 4 years ago · 1 comments

Using k=5, n=100, MAML fails to learn: average training and validation returns consistently hover around 50 throughout all 500 outer loop steps. Any possible discrepancies between this repo's code/config and the paper's experiments?

For reference, the following command

python train.py --config configs/maml/bandit/bandit-k5-n100.yaml --output-folder maml-bandit-k5-n100 --seed 1 --num-workers 10

produces the following average training/validation average returns for first and last 5 iterations respectively:

0 49.1 51.600002
1 45.5 47.75
2 49.449997 50.350002
3 49.65 52.2
4 50.4 52.7
...
495 46.6 50.0
496 50.150005 50.200005
497 53.100002 55.15
498 49.0 50.450005
499 44.5 47.600002

Answer 1 · 2021-02-05T12:28:36.000Z

The code did change significantly between the version we used in the paper and the current version (the paper was written on a very early version of the code, which probably got lost in the many refactoring we did even prior to open-sourcing the code). I haven't run bandit experiments since then on the new code unfortunately, I added the config files a few months ago after some request, but I haven't tried it myself. Unfortunately I don't know if the results should still hold with this version (I thought they would).