implicit-q-learning

A simple PyTorch implementation of the algorithm from the paper: "Offline Reinforcement Learning with Implicit Q-Learning", by Kostrikov et al: https://arxiv.org/abs/2110.06169. The code borrows heavily from Scott Fujimoto's original implementation of TD3+BC, with some code borrowed from d3rlpy. I compared my implementation with the original JAX source code for IQL on the D4RL-PyBullet datasets and got similar results, so I hope it is correct.

orrivlin/implicit-q-learning