This repo is modeled over Quan Vuong's pytorch implementation of the algorithm described in Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models.
A work in progress!
- The requirements in the original TF implementation
- Pytorch 1.0.0
For specific requirements, please take a look at the pip dependency file requirements.txt
and conda dependency file environments.yml
.
Experiments for a particular environment can be run using:
python mbexp.py
-env ENV (required) The name of the environment. Select from
[cartpole, reacher, pusher, halfcheetah].
Results will be saved in <logdir>/<date+time of experiment start>/
.
Trial data will be contained in logs.mat
, with the following contents:
{
"observations": NumPy array of shape
[num_train_iters * nrollouts_per_iter + ninit_rollouts, trial_lengths, obs_dim]
"actions": NumPy array of shape
[num_train_iters * nrollouts_per_iter + ninit_rollouts, trial_lengths, ac_dim]
"rewards": NumPy array of shape
[num_train_iters * nrollouts_per_iter + ninit_rollouts, trial_lengths, 1]
"returns": Numpy array of shape [1, num_train_iters * neval]
}
To visualize the result, please take a look at plotter.ipynb
Huge thank to the authors of the paper for open-sourcing their code. Most of this repo is taken from the official TF implementation.