nicklashansen/policy-adaptation-during-deployment

How to derive performance from various test domains

dandelionhsc opened this issue · 2 comments

Hello, this work is amazing and code is well-written.
I'm curious about the way of deriving performance in the paper. There are 10 test videos for video background, and there are 2 color domain (color_easy, color_hard) for randomized colors.
Did you just simply get the mean value of the performances for these various test domains?
Thank you very much.

Glad to hear that you found it interesting. Yes, we evaluate each seed for 10 episodes in each of the environments (i.e., each of 10 videos / 100 colors) and average across all. Std. deviations are computed across seeds.

Thank you very much! :)