Training details about MineAgent

Hi. Thank you for releasing the precious benchmark! I'm working on implementing the PPO agent you reported in the paper. However, I found some misalignments between the code and your paper.

Trimmed action space

As mentioned by #4, the code below does not correspond to the 89 action dims in Appendix G.2.

MineCLIP/main/mineagent/run_env_in_loop.py

Line 75 in e6c06a0

action_dim=[3, 3, 4, 25, 25, 8],

About the `compass` observation

In the paper I see that the compass has a shape of (2,). However, I see an input of (4,) shape in your code.

MineCLIP/main/mineagent/run_env_in_loop.py

Line 25 in e6c06a0

"compass": torch.rand((B, 4), device=device),

Training on `MultiDiscrete` action space

Is the 89-dimension action space in the paper a MultiDiscrete action space like the original MineDojo action space, or you simply treat it as a Discrete action space?

In addition, can you release the training code on three task groups in the paper (or share this code via my GitHub email)? It will be beneficial for baseline comparisons!

Trimmed action space

About the compass observation

Training on MultiDiscrete action space

About the `compass` observation

Training on `MultiDiscrete` action space