Question: is it possible to use the same Decision Transformer for new training trajectories generation?

Question

Question: is it possible to use the same Decision Transformer for new training trajectories generation?

Closed this issue 3 years ago · 2 comments

Maybe I'm missing something, but why do we stop the training after going over the initial trajectories dataset?
Can the same model be run again to generate new (better) trajectories and trained on them in an iterative manner?
Thanks for your time!

Answer 1 · 2021-10-21T20:55:44.000Z

@danielgafni Thought of that too (well, it immediately comes to mind). I'm now trying to test exactly this on a simple environment, similar to how it's done in Upside Down RL (or reward-conditioned RL). So, if you're interested, we can chat about it hehe

Answer 2 · 2021-10-22T23:27:10.000Z

The original paper studies offline RL, so this is not done. Offline pretraining -> online finetuning has been studied in various papers, and is generally useful but nontrivial; it's definitely an active area of research!