what is the mean of train_episodes and valid_episodes?
GeorgeDUT opened this issue · 4 comments
GeorgeDUT commented
the test.py writes a file about "task", "train_episodes", "valid_episodes". "train_episodes", "valid_episodes" are the total rewards of an episode?
tristandeleu commented
Yes train_returns
and valid_returns
are the (undiscounted) cumulated rewards for each task and each episode.
GeorgeDUT commented
what is the difference between "train_return" and "valid_return"?
tristandeleu commented
train_return
correspond to the returns for the trajectories sampled with the initial policy (before adaptation), and valid_return
correspond to the returns for the trajectories sampled with the adapted policy. Taking the notations of the paper (Algorithm 3), train_returns
correspond to the returns on D
and valid_returns
correspond to the returns on D'
.
GeorgeDUT commented
thanks so much, I get it