Farama-Foundation/D4RL-Evaluations

Clarification about Training and Evaluation Task Split

Closed this issue · 1 comments

Hi,

Thanks for sharing this repository. It is great
I'd like to ask about "Training and Evaluation Task Split" in Appendix D and how results are reported in Tables 1 and 3. I am a bit confused how those have been done.
For simplicity, let's assume BCQ and Maze2D are being used, which of the followings is correct description of what have been done in this paper:

  1. BCQ is trained on "maze2d-umaze-v1". Then the leaned model is used to report results on "maze2d-eval-umaze-v1"? In other words, maze2d-eval-umaze-v1 is not used for training and only used to report results?

  2. BCQ's hyperparameters are tuned on "maze2d-umaze-v1". Then, BCQ is trained with those hyperparameters and evaluated on "maze2d-eval-umaze-v1"? In other words, maze2d-eval-umaze-v1 is used for both training and evaluation?

  3. Or any other scenario?

Thanks for your help.

I asked it in d4rl repo which I believe more relevant.