nicklashansen/dmcontrol-generalization-benchmark

Questions about std in SVEA paper

Closed this issue · 2 comments

Hi, thanks for the great work!
I've noticed that "Hi, we compute the standard deviation over the mean episode returns of each seed". from the previous issue. (#4)
However, I'm still a bit confused. Could you please confirm if my understanding is correct?

  • (Fig.5 Top) Training performance: std of 5 seeds
  • (Fig.5 Bottom) Test performance: For each seed, run zero-shot evaluation 30 times (args.eval_episode) and calculate the mean from these 30 Return values (resulting in 1 mean value per seed). Then compute std using these 5 mean values.

Thank you!

I'm glad that you're interested in our work! I assume that you are referring to Figure 5 in our SVEA paper (https://arxiv.org/abs/2107.00644). Your understanding is correct: we evaluate each seed for X episodes, compute the mean return for each seed, and then report mean + std across seeds. This ensures that the std reflects variability between independent runs (seeds) rather than variability in the environment (e.g. initial conditions).

Thank you so much for the quick reply.