Some details in mineagent RL implementation

Question

Opened this issue 2 years ago · 2 comments

Hello! I am reproducing your paper results (train PPO+self-imitation, with MineCLIP reward), but fail to fill some missing details:

How to implement the agent's 89 discrete actions as said in paper? Currently your MineAgent uses multi-discrete output 3*3*4*25*25*3, which is much larger. Did you remove some action choices？
For computing DIRECT reward using the MineCLIP model, how to sample the negative texts and how many did you sample?
I find the timescale of 1 step in MineDojo simulation is much smaller than 1 second in Youtube videos. Did you use the last consecutive 16 rgb observations to compute reward?

Thank you!

Answer 1 · 2022-08-26T10:24:53.000Z

By the way, do you plan to release the training code, or the learned agent parameters?

Answer 2 · 2022-10-26T22:02:22.000Z

@YHQpkueecs Were you able to get the learned agent parameters from @LinxiFan @wangguanzhi or @yunfanjiang