Introduction

This repository is an implementation of [Model Based Trajectory-conditioned Policies for Learning from Sparse Rewards] in Tensorflow.

Training

The following command runs DTSIL on Apple-Gold domain:

cd Maze
python run_ppo_diverse.py