Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos
📄 Read Paper
📣 Blog Post
👾 MineRL Environment (note version 1.0+ required)
🏁 MineRL BASALT Competition
Designed to run using 16 frames of data rather than 128 and being a lot... dumber, but its goal is to be a lightweight... approximation rather than high def labels but, its also a test of whats required as when limiting the original IDM to 16 frames of data its guesses weren't terrible.
Known Limitations: does not apply virtual cursor to videos.
Setup some form of verification, to take the training data, and run against known data, to give it a score.
Verify it works. (and that I reshaped things properly)
IDM aims to predict what actions player is taking in a video recording.
Setup:
- Install requirements:
pip install -r requirements.txt
- Download the IDM model .model ⬇️ and .weight ⬇️ files
- For demonstration purposes, you can use the contractor recordings shared below to. For this demo we use this .mp4 and this associated actions file (.jsonl).
To run the model with above files placed in the root directory of this code:
python run_inverse_dynamics_model.py -weights 4x_idm.weights --model 4x_idm.model --video-path cheeky-cornflower-setter-02e496ce4abb-20220421-092639.mp4 --jsonl-path cheeky-cornflower-setter-02e496ce4abb-20220421-092639.jsonl
A window should pop up which shows the video frame-by-frame, showing the predicted and true (recorded) actions side-by-side on the left.
Note that run_inverse_dynamics_model.py
is designed to be a demo of the IDM, not code to put it into practice.
Original IDM Training code by ViktorThink, The rest is based on the work by: This was a large effort by a dedicated team at OpenAI: Bowen Baker, Ilge Akkaya, Peter Zhokhov, Joost Huizinga, Jie Tang, Adrien Ecoffet, Brandon Houghton, Raul Sampedro, Jeff Clune The code here represents a minimal version of our model code which was prepared by Anssi Kanervisto and others so that these models could be used as part of the MineRL BASALT competition.